1.0 Project Overview

1.1 Introduction

Initially used to track and contain disease outbreaks in the 1940s (Metcalf et al., 1995), wastewater analysis has quickly gained traction as a way to monitor other public health measures such as the usage of illicit drugs (Zuccato et al., 2008). Wastewater based epidemiology (WBE) is a chemical analysis technique used to detect a myriad of compounds for regions serviced by wastewater treatment facilities. The functions of this technique range from monitoring environmental impact of household liquid waste to the very topical application of detecting the presence of Covid-19 (Glassmeyer et al., 2005), (Bentacourt et al., 2021).Following the consumption of illicit drugs, the body enzymatically breaks them down into by-products known as drug metabolites. As the process of metabolizing drugs is the same within all humans, the type and quantity of produced metabolites is already known (Concheiro et al., 2007). With this knowledge, scientists are able to estimate the quantity of drugs consumed (drug usage) within a community by measuring the metabolites within the local wastewater (Zuccato et al., 2008). Estimating the use of illicit drugs within a community is a crucial public health matter due to the known harms that regular consumption can cause (Bannwarth et al., 2019). This information can be used to help direct policies and distribute resources to communities in need.

The following analysis is based off a wastewater analysis study which estimated the drug usage of amphetamines, cannabis, cocaine, 3,4-Methylenedioxymethamphetamine (MDMA), and methamphetamine within 137 cities across 29 European countries between 2011 and 2020 (EMCDDA, 2021). In our analysis, we sought to test whether drug usage varies between countries. Previous research suggests that drug usage is related to population-level age demographics; self-reported drug usage tends to be greatest in youth populations (Health Canada, 2010). Consistent with the definition of youth used by the European Monitoring Centre for Drugs and Drug Addiction (EMCDDA), we chose to define youth as those aged 15 to 24. Accordingly, we hypothesized that drug usage will differ between countries, and that differences in the relative proportion of youth (expressed as a percentage, PctYouth) would help explain these differences in drug usage between countries. Previous research also suggests that drug usage is related to population-level wealth demographics (Lipari and Park-Lee, 2019). We chose GDP as a measure of population wealth because it is widely referenced and compared in the literature (Nagelhout et al., 2017). Accordingly, we hypothesized that differences in Gross Domestic Product (GDP) would also explain differences in drug usage between countries. The objectives of this project are outlined in Section 1.2; corresponding hypotheses are summarized as a table in Section 1.3.

This analysis is important because it helps contribute to our understanding of the social determinants of health. Our analysis will specifically help improve our understanding of the characteristics of populations that may correspond to increased drug usage.

1.2 Research Objectives

Table 1.2. List of research objectives

No. Objective
1. To examine how drug usage differs between European reference countries.
2. To quantify the effect of the relative number of youth represented by the population of a country (PctYouth) on drug usage by reference country.
3. To quantify the effect of the relative wealth of a country (GDP) on drug usage by reference country.

1.3 Hypotheses

The effect of reference country on drug usage is dependent on the age (PctYouth) and wealth (GDP) characteristics of each reference country.

1.4 Sampling design

In an attempt to improve the reliability of the data, only reference countries that had drug usage estimates available for at least six reference cities in the original drug usage data set were considered in our analysis. Thus, the reference countries for our analysis include: Austria, Belgium, Finland, Germany, Slovenia, Spain, Sweden, and Switzerland. The sample for the following analysis was constructed from the observations corresponding to the average weekday drug usage of 5 different types of reference drugs (amphetamines, canabis, cocaine, methamphetamine, and MDMA), measured across reference cities nested within six European reference countries, between the reference years 2011 and 2020. Average weekday usage was chosen because weekend usage data are unavailable for many of the observations in the original drug usage data set. Note that observations corresponding to a number of combinations of type of illicit reference drug, reference city, and reference year are also missing from the original data. For example, many measurements are missing for cannabis; thus, we chose to not include cannabis in our final analysis.

1.5 Study design

In the following, we use an observational study design to assess how drug usage differs across European reference countries included in our sample. The response variable of this study is drug usage (measured in mg/1000 people/day); the explanatory variable of this study is reference country (Austria, Belgium, Finland, Germany, Slovenia, Spain, Sweden, and Switzerland). We are also interested in two cofactors: PctYouth, the relative number of youth represented by the population of a country (measured as a proportion), and GDP, a measure of the monetary value of economic activity (i.e. the production of goods and services) corresponding to a country (measured in 2010 USD). These cofactors are utilized to quantify the effect of population age and wealth demographics, respectively, on the relationship between drug usage, our explanatory variable, and reference country, the explanatory variable (Section 1.1). Thus, we construct linear models to address our research objectives (Section 6.0). Using these models, we also test for interaction effects between the two cofactors (PctYouth, GDP) used in this analysis.

1.6 Operational Definitions

Throughout this document, bold text represents vocabulary terms that have been given specific, operational definitions for the context of the following analysis. These terms and definitions are provided as a table below.

Table 1.6. Operational definitions.

Term Definition
Drug Usage Reported for each reference city and reference drug in mg/1000 people/day (EMCDDA, 2021). For additional information, see Appendix A.
Gross Domestic Product (GDP) The sum of gross value added by all resident producers in the economy, plus any product taxes and minus any subsidies not included in the value of products, reported for each country of interest in 2010 United States Dollars (USD). Calculated without making deductions for depreciation of fabricated assets for depletion and degradation of natural resources (World Bank Data, 2021). For more information on why this metric was chosen, see Section 3.3.
PctYouth The number of individuals in the population that fall within the age range 15-24. Reported for each country of interest as a percentage (UN Department of Economic and Social Affairs, 2019). For more information on why this metric was chosen, see Section 3.2.
Reference city The cities across the 6 European countries that drug usage was estimated in. For additional information, see Appendix A.
Reference country The 6 European countries (Austria, Belgium, Finland, Germany, Slovenia, Spain, Sweden, and Switzerland) corresponding to drug usage estimates used in the present analysis. For ease of recognition, the names of these reference countries are presented in italics throughout this document. For more information on how these countries were selected, see Appendix A.
Reference drug The 5 types of drugs (amphetamines, cannabis, cocaine, MDMA, methamphetamine) corresponding to drug usage estimates used in the present analysis. Throughout this document, the names of these illicit drugs are presented in italics for ease of recognition. For additional information on the types of illicit drugs considered in this analysis, see Appendix A.
Reference year The 10 years (2011-2020) corresponding to drug usage estimates used in the present analysis. For additional information, see Appendix A.
Wastewater based epidemiology (WBE) A chemical analysis technique used to detect compounds in the waters serviced by wastewater treatment facilities. For example, WBE is used in the present analysis to produce estimates of drug usage. For more information on WBE, see Appendix B.

2.0 Data acquisition

The following section provides a brief overview of the data acquisition for the drug usage, population age, and population wealth estimates used in our analysis (Section 2.1, Section 2.2, and Section 2.3, respectively). For more information, see the full project metadata in Appendix A.

2.1 Drug Use Data

Drug usage data was accessed through the European Monitoring Center for Drugs and Drug Addiction website (EMCDDA) (https://www.emcdda.europa.eu/publications/html/pods/waste-water-analysis_en#siteInfoTable). This data set looked at the usage of five common illicit drugs (amphetamines, cannabis, cocaine, methamphetamine, and MDMA), as measured in the wastewater of 137 cities across 29 European countries. The data were collected daily (in mg per 1000 people per day) over a span of 10 years (from 2011 to 2020). The data was averaged for each day of the week, the weekend, and the weekdays for each city over every year considered.

The subset of drug use data of interest is constructed from the mean weekday usage of each reference drug in each combination of reference city, nested within reference country, and reference year. For more information, including why weekday mean values were used and missing data in the original drug use data set, see Section 1.4.

2.2 Population Age Data

Data on population age demographics were accessed from the United Nations Department of Economic and Social Affairs website (https://population.un.org/wpp/). Data were available from 235 countries and regions over a period of 60 years (from 1950-2020). These data show the proportion of the population accounted for by a number of pre-defined age groups (e.g. youth aged 15-24).

The subset of age data of interest was the proportion of the population aged 15-24 (PctYouth) corresponding to combinations of reference countries and reference years included in our study sample (Section 1.4).

2.3 Population Wealth Data

Data on population wealth demographics were accessed via the World Bank website (https://data.worldbank.org/indicator/NY.GDP.MKTP.CD?fbclid=IwAR3tVOmvWo6ijRcavhXuUEhoDZgkVFE4DeiD1CtiZDYYiqpVk8sj0kSxxIY). This data set includes annual Gross Domestic Product (GDP; measured in the value of the 2010 US Dollar) for approximately 200 countries and regions between 1960 and 2020.

The subset of wealth data of interest was the GDP corresponding to combinations of reference countries and reference years included in our study sample (Section 1.4).

3.0 Setup

In this section, we set the R Markdown Document (RMD) up for analysis. Setting the document up for analysis involved two steps, loading the dependency libraries (Section 3.1), and importing data (Section 3.2).

3.1 Load Libraries

The first step of setting up our RMD was to load the dependency libraries. Descriptions and references for each of these packages are available in Appendix C.

library(ggplot2)
library(dplyr)
library(ggfortify)
library(readr)
library(tidyr)
library(stringr)
library(lubridate)
library(plotrix)
library(lattice)

3.2 Import Data

The second step of setting up our RMD was to import the raw data. The raw data files used in this RMD are the raw .csv source data files (Appendix A). These files have also been made available as a part of a GitHub Repository for this project (https://github.com/trevor-lyman/European-drug-usage_2011-2020).

First, we imported the drug usage data.

# step 1: read .csv files
# 2011
amphetamine_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-amphetamine-2011.csv")
cannabis_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-cannabis-2011.csv")
cocaine_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-cocaine-2011.csv")
MDMA_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-MDMA-2011.csv")
methamphetamine_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-methamphetamine-2011.csv")

# 2012
amphetamine_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-amphetamine-2012.csv")
cannabis_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-cannabis-2012.csv")
cocaine_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-cocaine-2012.csv")
MDMA_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-MDMA-2012.csv")
methamphetamine_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-methamphetamine-2012.csv")

# 2013 -- no cannabis data available
amphetamine_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-amphetamine-2013.csv")
cannabis_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-cannabis-2013.csv")
cocaine_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-cocaine-2013.csv")
MDMA_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-MDMA-2013.csv")
methamphetamine_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-methamphetamine-2013.csv")

# 2014 -- no cannabis data available
amphetamine_2014 <- read.csv("Data/Drug Metabolite Data/2014/WW-data-amphetamine-2014.csv")
cocaine_2014 <- read.csv("Data/Drug Metabolite Data/2014/WW-data-cocaine-2014.csv")
MDMA_2014 <- read.csv("Data/Drug Metabolite Data/2014/WW-data-MDMA-2014.csv")
methamphetamine_2014 <- read.csv("Data/Drug Metabolite Data/2014/WW-data-methamphetamine-2014.csv")

# 2015 -- no cannabis data available
amphetamine_2015 <- read.csv("Data/Drug Metabolite Data/2015/WW-data-amphetamine-2015.csv")
cocaine_2015 <- read.csv("Data/Drug Metabolite Data/2015/WW-data-cocaine-2015.csv")
MDMA_2015 <- read.csv("Data/Drug Metabolite Data/2015/WW-data-MDMA-2015.csv")
methamphetamine_2015 <- read.csv("Data/Drug Metabolite Data/2015/WW-data-methamphetamine-2015.csv")

# 2016 -- no cannabis data available
amphetamine_2016 <- read.csv("Data/Drug Metabolite Data/2016/WW-data-amphetamine-2016.csv")
cocaine_2016 <- read.csv("Data/Drug Metabolite Data/2016/WW-data-cocaine-2016.csv")
MDMA_2016 <- read.csv("Data/Drug Metabolite Data/2016/WW-data-MDMA-2016.csv")
methamphetamine_2016 <- read.csv("Data/Drug Metabolite Data/2016/WW-data-methamphetamine-2016.csv")

# 2017 -- no cannabis data available
amphetamine_2017 <- read.csv("Data/Drug Metabolite Data/2017/WW-data-amphetamine-2017.csv")
cocaine_2017 <- read.csv("Data/Drug Metabolite Data/2017/WW-data-cocaine-2017.csv")
MDMA_2017 <- read.csv("Data/Drug Metabolite Data/2017/WW-data-MDMA-2017.csv")
methamphetamine_2017 <- read.csv("Data/Drug Metabolite Data/2017/WW-data-methamphetamine-2017.csv")

# 2018 
amphetamine_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-amphetamine-2018.csv")
cannabis_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-cannabis-2018.csv")
cocaine_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-cocaine-2018.csv")
MDMA_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-MDMA-2018.csv")
methamphetamine_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-methamphetamine-2018.csv")

# 2019
amphetamine_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-amphetamine-2019.csv")
cannabis_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-cannabis-2019.csv")
cocaine_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-cocaine-2019.csv")
MDMA_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-MDMA-2019.csv")
methamphetamine_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-methamphetamine-2019.csv")

# 2020
amphetamine_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-amphetamine-2020.csv")
cannabis_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-cannabis-2020.csv")
cocaine_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-cocaine-2020.csv")
MDMA_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-MDMA-2020.csv")
methamphetamine_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-methamphetamine-2020.csv")

# step 2: check data
head(amphetamine_2011); str(amphetamine_2011) # check the data
##   SiteID country         city Wednesday Thursday Friday Saturday Sunday Monday
## 1 site14      BE Antwerp Zuid    229.63   211.52 318.89   318.79 311.03 272.62
## 2 site17      BE     Brussels     22.92    28.77  39.31    57.57  44.62  32.38
## 3 site34      CZ      Budweis     20.20    27.07  24.60    29.43  23.50  37.31
## 4 site55      ES    Barcelona      7.40    12.14  13.21    25.29  19.50  15.94
## 5 site56      ES    Castellon      0.00     0.00   0.00     0.00   0.00   0.00
## 6 site59      ES     Santiago     33.99    35.46  41.20    40.04  30.08     NA
##   Tuesday Weekday.mean Weekend.mean Daily.mean
## 1  277.78       239.64       305.33     277.18
## 2   22.18        24.63        43.47      35.39
## 3   37.43        28.23        28.71      28.51
## 4   14.41        11.31        18.48      15.41
## 5    0.00         0.00         0.00       0.00
## 6   22.29        30.58        37.11      33.84
## 'data.frame':    14 obs. of  13 variables:
##  $ SiteID      : chr  "site14" "site17" "site34" "site55" ...
##  $ country     : chr  "BE" "BE" "CZ" "ES" ...
##  $ city        : chr  "Antwerp Zuid" "Brussels" "Budweis" "Barcelona" ...
##  $ Wednesday   : num  229.6 22.9 20.2 7.4 0 ...
##  $ Thursday    : num  211.5 28.8 27.1 12.1 0 ...
##  $ Friday      : num  318.9 39.3 24.6 13.2 0 ...
##  $ Saturday    : num  318.8 57.6 29.4 25.3 0 ...
##  $ Sunday      : num  311 44.6 23.5 19.5 0 ...
##  $ Monday      : num  272.6 32.4 37.3 15.9 0 ...
##  $ Tuesday     : num  277.8 22.2 37.4 14.4 0 ...
##  $ Weekday.mean: num  239.6 24.6 28.2 11.3 0 ...
##  $ Weekend.mean: num  305.3 43.5 28.7 18.5 0 ...
##  $ Daily.mean  : num  277.2 35.4 28.5 15.4 0 ...

Next, we imported the population age (PctYouth) data.

raw.PctYouth <- read.csv("Data/Age Data/raw.age.csv", skip = 16) # read .csv file
raw.PctYouth <- raw.PctYouth %>%
  select(Region..subregion..country.or.area.., 
         Reference.date..as.of.1.July., 
         Population.aged.15.24) 
# select 3 variables of interest

head(raw.PctYouth); str(raw.PctYouth) # check data
##   Region..subregion..country.or.area.. Reference.date..as.of.1.July.
## 1                                WORLD                          1950
## 2                                WORLD                          1951
## 3                                WORLD                          1952
## 4                                WORLD                          1953
## 5                                WORLD                          1954
## 6                                WORLD                          1955
##   Population.aged.15.24
## 1                  18.2
## 2                  18.1
## 3                  18.0
## 4                  17.8
## 5                  17.7
## 6                  17.6
## 'data.frame':    18105 obs. of  3 variables:
##  $ Region..subregion..country.or.area..: chr  "WORLD" "WORLD" "WORLD" "WORLD" ...
##  $ Reference.date..as.of.1.July.       : int  1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 ...
##  $ Population.aged.15.24               : chr  "18.2" "18.1" "18.0" "17.8" ...

Finally, we imported the population wealth (GDP) data.

raw.gdp <- as.data.frame(na.omit(pivot_longer(data = read.csv("Data/GDP Data/raw.gdp.csv", skip = 4), 
                                              cols = 5:65, names_to = "year", values_to = "GDP"))) 
#read .csv file; use pivot_longer() to transform to long format

head(raw.gdp); str(raw.gdp) #check data
##   Country.Name Country.Code    Indicator.Name Indicator.Code  year       GDP
## 1        Aruba          ABW GDP (current US$) NY.GDP.MKTP.CD X1986 405463417
## 2        Aruba          ABW GDP (current US$) NY.GDP.MKTP.CD X1987 487602458
## 3        Aruba          ABW GDP (current US$) NY.GDP.MKTP.CD X1988 596423607
## 4        Aruba          ABW GDP (current US$) NY.GDP.MKTP.CD X1989 695304363
## 5        Aruba          ABW GDP (current US$) NY.GDP.MKTP.CD X1990 764887117
## 6        Aruba          ABW GDP (current US$) NY.GDP.MKTP.CD X1991 872138715
## 'data.frame':    12762 obs. of  6 variables:
##  $ Country.Name  : chr  "Aruba" "Aruba" "Aruba" "Aruba" ...
##  $ Country.Code  : chr  "ABW" "ABW" "ABW" "ABW" ...
##  $ Indicator.Name: chr  "GDP (current US$)" "GDP (current US$)" "GDP (current US$)" "GDP (current US$)" ...
##  $ Indicator.Code: chr  "NY.GDP.MKTP.CD" "NY.GDP.MKTP.CD" "NY.GDP.MKTP.CD" "NY.GDP.MKTP.CD" ...
##  $ year          : chr  "X1986" "X1987" "X1988" "X1989" ...
##  $ GDP           : num  4.05e+08 4.88e+08 5.96e+08 6.95e+08 7.65e+08 ...
##  - attr(*, "na.action")= 'omit' Named int [1:3464] 1 2 3 4 5 6 7 8 9 10 ...
##   ..- attr(*, "names")= chr [1:3464] "1" "2" "3" "4" ...

4.0 Tidying the Data

After our data was imported into the R environment, our next step was to restructure the data to prepare for analysis. This is often referred to as ‘tidying’ the data (Beckerman et al., 2017). We do this individually for the drug usage, PctYouth, and GDP data in subsections Section 4.1, Section 4.2, and Section 4.3, respectively. The goal of this section was to build a long-wise table that presented the reference country, reference city, reference year, reference drug, and average weekday drug use for each observation in the sample. This final step is shown in subsection Section 4.5.

4.1 Drug Usage Data

First, we tidied the drug usage data; this is shown below in 3 nested steps. These data were imported as individual .csv files for each type of drug and reference year in Section 3.2; we wanted to merge these observations into a single dataframe that contained observations of the average weekday drug usage corresponding to each type of drug from each combination of reference country and reference year in the sample.

The first step of tidying the drug usage data is shown below. We began by tidying each of the data corresponding to observations of the use of a reference drug from a single reference year. We repeated this step for each reference drug for a single reference year.

#step 1: 2011 amphetamine
amphetamine_2011.tidy <- amphetamine_2011 %>% #create data frame
  select(c(country, city, Weekday.mean)) %>% 
  # subset country, city, weekday mean concentration
  mutate(year = c(rep(2011, length(amphetamine_2011$country)))) %>%
  # add variable for reference year
  mutate(drug = c(rep("amphetamine", length(amphetamine_2011$country))))
  # add variable for type of drug usage

amphetamine_2011.tidy$country <- as.factor(amphetamine_2011.tidy$country)
# make country a factor
levels(amphetamine_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic", ES = "Spain", 
                                           FR = "France", B = "Great Britain", HR = "Hungary", 
                                           IT = "Italy", NL = "Netherlands", NO = "Norway", 
                                           SE = "Sweden")
# change country codes to names of countries 
head(amphetamine_2011.tidy); str(amphetamine_2011.tidy) # check the data
##          country         city Weekday.mean year        drug
## 1        Belgium Antwerp Zuid       239.64 2011 amphetamine
## 2        Belgium     Brussels        24.63 2011 amphetamine
## 3 Czech Republic      Budweis        28.23 2011 amphetamine
## 4          Spain    Barcelona        11.31 2011 amphetamine
## 5          Spain    Castellon         0.00 2011 amphetamine
## 6          Spain     Santiago        30.58 2011 amphetamine
## 'data.frame':    14 obs. of  5 variables:
##  $ country     : Factor w/ 10 levels "Belgium","Czech Republic",..: 1 1 2 3 3 3 4 5 6 7 ...
##  $ city        : chr  "Antwerp Zuid" "Brussels" "Budweis" "Barcelona" ...
##  $ Weekday.mean: num  239.6 24.6 28.2 11.3 0 ...
##  $ year        : num  2011 2011 2011 2011 2011 ...
##  $ drug        : chr  "amphetamine" "amphetamine" "amphetamine" "amphetamine" ...

The next step was to build an annual drug usage summary, as shown below. This is a long-wise table showing the types and amount of drug usage observed in each reference city in the given reference year.

# 2011 cannabis 
cannabis_2011.tidy <- cannabis_2011 %>% # create data frame
  select(c(country, city, Weekday.mean)) %>%
   # subset country, city, weekday mean concentration
  mutate(year = c(rep(2011, length(cannabis_2011$country)))) %>%
  # add variable for reference year
  mutate(drug = c(rep("cannabis", length(cannabis_2011$country))))
  # add variable for type of drug usage

cannabis_2011.tidy$country <- as.factor(cannabis_2011.tidy$country)
# make country a factor 
levels(cannabis_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic", ES = "Spain", 
                                        FR = "France", HR = "Hungary", IT = "Italy", 
                                        NL = "Netherlands", SE = "Sweden")
# change country codes to names of countries 

# 2011 cocaine
cocaine_2011.tidy <- cocaine_2011 %>% # create data frame
  select(c(country, city, Weekday.mean)) %>%
   # subset country, city, weekday mean concentration
  mutate(year = c(rep(2011, length(cocaine_2011$country)))) %>%
  # add variable for reference year
  mutate(drug = c(rep("cocaine", length(cocaine_2011$country))))
  # add variable for type of drug usage
cocaine_2011.tidy$country <- as.factor(cocaine_2011.tidy$country)
# make country a factor 
levels(cocaine_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic", ES = "Spain", 
                                       FR = "France", GB = "Great Britain", HR = "Hungary", 
                                       IT = "Italy", NL = "Netherlands", NO = "Norway", SE = "Sweden")
# change country codes to names of countries 

# 2011 MDMA
MDMA_2011.tidy <- MDMA_2011 %>% # create data frame
  select(c(country, city, Weekday.mean)) %>%
   # subset country, city, weekday mean concentration
  mutate(year = c(rep(2011, length(MDMA_2011$country)))) %>%
  # add variable for reference year
  mutate(drug = c(rep("MDMA", length(MDMA_2011$country))))
  # add variable for type of drug metabolie
MDMA_2011.tidy$country <- as.factor(MDMA_2011.tidy$country)
# make country a factor 
levels(MDMA_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic", ES = "Spain", FR = "France", 
                                    GB = "Great Britain", HR = "Hungary", IT = "Italy", 
                                    NL = "Netherlands", NO = "Norway", SE = "Sweden")
# change country codes to names of countries 

methamphetamine_2011.tidy.temp <- methamphetamine_2011 %>% # create data frame
  select(c(country, city, methamphetamineWDMean2011)) %>%
  # note that weekday mean concentration has a different name in this file than all others
  mutate(Weekday.mean = methamphetamine_2011$methamphetamineWDMean2011) %>%
  # fix name of weekday mean concentration to be consistent with others
  mutate(year = c(rep(2011, length(methamphetamine_2011$country)))) %>%
  # add variable for reference year
  mutate(drug = c(rep("methamphetamine", length(methamphetamine_2011$country))))
  # add variable for type of drug usage
methamphetamine_2011.tidy <- select(methamphetamine_2011.tidy.temp, -c(methamphetamineWDMean2011))
# remove weekday mean variable with incorrect name
methamphetamine_2011.tidy$country <- as.factor(methamphetamine_2011.tidy$country)
# make country a factor 
levels(methamphetamine_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic", ES = "Spain", 
                                               FR = "France", GB = "Great Britain", HR = "Hungary", 
                                               IT = "Italy", NL = "Netherlands", NO = "Norway", 
                                               SE = "Sweden")
# change country codes to names of countries 

drug_2011 <- rbind(amphetamine_2011.tidy, cannabis_2011.tidy, cocaine_2011.tidy, MDMA_2011.tidy, 
                   methamphetamine_2011.tidy) # use rbind() to merge all drug usages in a single dataframe

head(drug_2011); str(drug_2011) #check the data
##          country         city Weekday.mean year        drug
## 1        Belgium Antwerp Zuid       239.64 2011 amphetamine
## 2        Belgium     Brussels        24.63 2011 amphetamine
## 3 Czech Republic      Budweis        28.23 2011 amphetamine
## 4          Spain    Barcelona        11.31 2011 amphetamine
## 5          Spain    Castellon         0.00 2011 amphetamine
## 6          Spain     Santiago        30.58 2011 amphetamine
## 'data.frame':    71 obs. of  5 variables:
##  $ country     : Factor w/ 10 levels "Belgium","Czech Republic",..: 1 1 2 3 3 3 4 5 6 7 ...
##  $ city        : chr  "Antwerp Zuid" "Brussels" "Budweis" "Barcelona" ...
##  $ Weekday.mean: num  239.6 24.6 28.2 11.3 0 ...
##  $ year        : num  2011 2011 2011 2011 2011 ...
##  $ drug        : chr  "amphetamine" "amphetamine" "amphetamine" "amphetamine" ...

The final step of tidying the drug usage data was to repeat the above steps for each reference year in the sample, and to merge the resulting 10 annual drug usage summaries into a single long-wise table adding the variable reference year. Again, this step is shown below.

# 2012
# amphetamine
amphetamine_2012.tidy <- amphetamine_2012 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2012, length(amphetamine_2012$country)))) %>%
  mutate(drug = c(rep("amphetamine", length(amphetamine_2012$country))))
amphetamine_2012.tidy$country <- as.factor(amphetamine_2012.tidy$country)
levels(amphetamine_2012$country) <- c(BE = "Belgium", CH = "Switzerland", CZ = "Czech Republic", 
                                      ES = "Spain", FI = "Finland", FR = "France", GB = "Great Britain", 
                                      HR = "Hungary", IT = "Italy", NL = "Netherlands", 
                                      NO = "Norway", SE = "Sweden")
# cannabis 
cannabis_2012.tidy <- cannabis_2012 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2012, length(cannabis_2012$country)))) %>%
  mutate(drug = c(rep("cannabis", length(cannabis_2012$country))))
cannabis_2012.tidy$country <- as.factor(cannabis_2012.tidy$country)
levels(cannabis_2012.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic", ES = "Spain", 
                                        FR = "France", HR = "Hungary", IT = "Italy", 
                                        NL = "Netherlands", NO = "Norway", SE = "Sweden")
# cocaine
cocaine_2012.tidy <- cocaine_2012 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2012, length(cocaine_2012$country)))) %>%
  mutate(drug = c(rep("cocaine", length(cocaine_2012$country))))
cocaine_2012.tidy$country <- as.factor(cocaine_2012.tidy$country)
levels(cocaine_2012.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", CZ = "Czech Republic", 
                                       ES = "Spain", FI = "Finland", FR = "France", HR = "Hungary", 
                                       IT = "Italy", NL = "Netherlands", NO = "Norway", SE = "Sweden")
# MDMA
MDMA_2012.tidy <- MDMA_2012 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2012, length(MDMA_2012$country)))) %>%
  mutate(drug = c(rep("MDMA", length(MDMA_2012$country))))
MDMA_2012.tidy$country <- as.factor(MDMA_2012.tidy$country)
levels(MDMA_2012.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", CZ = "Czech Republic", 
                                    ES = "Spain", FI = "Finland", HR = "Hungary", IT = "Italy", 
                                    NL = "Netherlands", NO = "Norway", SE = "Sweden")
# methamphetamine
methamphetamine_2012.tidy <- methamphetamine_2012 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2012, length(methamphetamine_2012$country)))) %>%
  mutate(drug = c(rep("methamphetamine", length(methamphetamine_2012$country))))
methamphetamine_2012.tidy$country <- as.factor(methamphetamine_2012.tidy$country)
levels(methamphetamine_2012.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", 
                                               CZ = "Czech Republic", ES = "Spain", FR = "France", 
                                               HR = "Hungary", IT = "Italy", NL = "Netherlands", 
                                               NO = "Norway", SE = "Sweden")
# 2012 annual summary
drug_2012 <- rbind(amphetamine_2012.tidy, cannabis_2012.tidy, cocaine_2012.tidy, MDMA_2012.tidy, 
                   methamphetamine_2012.tidy) 

# 2013
# amphetamine
amphetamine_2013.tidy <- amphetamine_2013 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2013, length(amphetamine_2013$country)))) %>%
  mutate(drug = c(rep("amphetamine", length(amphetamine_2013$country))))
amphetamine_2013.tidy$city <- as.factor(amphetamine_2013.tidy$city)
amphetamine_2013.tidy$country <- as.factor(amphetamine_2013.tidy$country)
levels(amphetamine_2013.tidy$country) <- c(BA = "Bosnia", BE = "Belgium", CH = "Switzerland", 
                                           CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                           DK = "Denmark", ES = "Spain", FI = "Finland", 
                                           FR = "France", GB = "Great Britain", GR = "Greece", 
                                           HR = "Hungary", NL = "Netherlands", NO = "Norway", 
                                           PT = "Portugal", RO = "Romania", RS = "Serbia", 
                                           SE = "Sweden", SK = "Slovakia")
# cannabis
cannabis_2013.tidy <- cannabis_2013 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2013, length(cannabis_2013$country)))) %>%
  mutate(drug = c(rep("cannabis", length(cannabis_2013$country))))
cannabis_2013.tidy$city <- as.factor(cannabis_2013.tidy$city)
cannabis_2013.tidy$country <- as.factor(cannabis_2013.tidy$country)
levels(cannabis_2013.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", DE = "Germany", 
                                        DK = "Denmark", ES = "Spain", FR = "France", GR = "Greece", 
                                        HR = "Hungary", IT = "Italy", NL = "Netherlands", 
                                        NO = "Norway", PT = "Portugal")
# cocaine
cocaine_2013.tidy <- cocaine_2013 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2013, length(cocaine_2013$country)))) %>%
  mutate(drug = c(rep("cocaine", length(cocaine_2013$country))))
cocaine_2013.tidy$city <- as.factor(cocaine_2013.tidy$city)
cocaine_2013.tidy$country <- as.factor(cocaine_2013.tidy$country)
levels(cocaine_2013.tidy$country) <- c(BA = "Bosnia", BE = "Belgium", CH = "Switzerland", CY = "Cyprus", 
                                       CZ = "Czech Republic", DE = "Germany", DK = "Denmark", 
                                       ES = "Spain", FR = "France", GB = "Great Britain", GR = "Greece", 
                                       HR = "Hungary", IT = "Italy", NL = "Netherlands", NO = "Norway", 
                                       PT = "Portugal", RO = "Romania", RS = "Serbia", SE = "Sweden", 
                                       SK = "Slovakia")
# MDMA
MDMA_2013.tidy <- MDMA_2013 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2013, length(MDMA_2013$country)))) %>%
  mutate(drug = c(rep("MDMA", length(MDMA_2013$country))))
MDMA_2013.tidy$city <- as.factor(MDMA_2013.tidy$city)
MDMA_2013.tidy$country <- as.factor(MDMA_2013.tidy$country)
levels(MDMA_2013.tidy$country) <- c(BA = "Bosnia", BE = "Belgium", CH = "Switzerland", CY = "Cyprus", 
                                    CZ = "Czech Republic", DE = "Germany", DK = "Denmark", 
                                    ES = "Spain", FI = "Finland", FR = "France", GB = "Great Britain", 
                                    GR = "Greece", HR = "Hungary", IT = "Italy", NL = "Netherlands", 
                                    NO = "Norway", PT = "Portugal", RO = "Romania", RS = "Serbia", 
                                    SE = "Sweden", SK = "Slovakia")
# methamphetamine
methamphetamine_2013.tidy <- methamphetamine_2013 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2013, length(methamphetamine_2013$country)))) %>%
  mutate(drug = c(rep("methamphetamine", length(methamphetamine_2013$country))))
methamphetamine_2013.tidy$city <- as.factor(methamphetamine_2013.tidy$city)
methamphetamine_2013.tidy$country <- as.factor(methamphetamine_2013.tidy$country)
levels(methamphetamine_2013.tidy$country) <- c(BA = "Bosnia", BE = "Belgium", CH = "Switzerland", 
                                               CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                               DK = "Denmark", ES = "Spain", FR = "France", 
                                               GB = "Great Britain", GF = "Greece", IT = "Italy", 
                                               NL = "Netherlands", NO = "Norway", PT = "Portugal", 
                                               RO = "Romania", RS = "Serbia", SE = "Sweden", 
                                               SK = "Slovakia")
# 2013 annual summary
drug_2013 <- rbind(amphetamine_2013.tidy, cannabis_2013.tidy, cocaine_2013.tidy, MDMA_2013.tidy, 
                   methamphetamine_2013.tidy)

# 2014
# amphetamine
amphetamine_2014.tidy <- amphetamine_2014 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2014, length(amphetamine_2014$country)))) %>%
  mutate(drug = c(rep("amphetamine", length(amphetamine_2014$country))))
amphetamine_2014.tidy$country <- as.factor(amphetamine_2014.tidy$country)
levels(amphetamine_2014.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", DE = "Germany", 
                                           ES = "Spain", FI = "Finland", FR = "France", 
                                           GB = "Great Britain", GR = "Greece", HR = "Hungary", 
                                           IT = "Italy", NL = "Netherlands", PT = "Portugal")
# cocaine
cocaine_2014.tidy <- cocaine_2014 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2014, length(cocaine_2014$country)))) %>%
  mutate(drug = c(rep("cocaine", length(cocaine_2014$country))))
cocaine_2014.tidy$country <- as.factor(cocaine_2014.tidy$country)
levels(cocaine_2014.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", DE = "Germany", 
                                       DK = "Denmark", ES = "Spain", FI = "Finland", 
                                       FR = "France", GB = "Great Britain", GR = "Greece", 
                                       HR = "Hungary", IT = "Italy", NL = "Netherlands", 
                                       NO = "Norway", PT = "Portugal")
# MDMA
MDMA_2014.tidy <- MDMA_2014 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2014, length(MDMA_2014$country)))) %>%
  mutate(drug = c(rep("MDMA", length(MDMA_2014$country))))
MDMA_2014.tidy$city <- as.factor(MDMA_2014.tidy$city)
MDMA_2014.tidy$country <- as.factor(MDMA_2014.tidy$country)
levels(MDMA_2014.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", DE = "Germany", DK = "Denmark", 
                                    ES = "Spain", FI = "Finland", FR = "France", GB = "Great Britain", 
                                    GR = "Greece", HR = "Hungary", IT = "Italy", NL = "Netherlands", 
                                    NO = "Norway", PT = "Portugal")
# methamphetamine 
methamphetamine_2014.tidy <- methamphetamine_2014 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2014, length(methamphetamine_2014$country)))) %>%
  mutate(drug = c(rep("methamphetamine", length(methamphetamine_2014$country))))
methamphetamine_2014.tidy$city <- as.factor(methamphetamine_2014.tidy$city)
methamphetamine_2014.tidy$country <- as.factor(methamphetamine_2014.tidy$country)
levels(methamphetamine_2014.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", DE = "Germany", 
                                               DK = "Denmark", ES = "Spain", FI = "Finland", 
                                               FR = "France", GB = "Great Britain", GR = "Greece", 
                                               HR = "Hungary", IT = "Italy", NL = "Netherlands", 
                                               NO = "Norway", PT = "Portugal")
# 2014 annual summary
drug_2014 <- rbind(amphetamine_2014.tidy, cocaine_2014.tidy, MDMA_2014.tidy, methamphetamine_2014.tidy)

# 2015
# amphetamine 
amphetamine_2015.tidy <- amphetamine_2015 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2015, length(amphetamine_2015$country)))) %>%
  mutate(drug = c(rep("amphetamine", length(amphetamine_2015$country))))
amphetamine_2015.tidy$country <- as.factor(amphetamine_2015.tidy$country)
levels(amphetamine_2015.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                           CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                           DK = "Denmark", ES = "Spain", FI = "Finland", 
                                           FR = "France", GB = "Great Britain", GR = "Greece", 
                                           HR = "Hungary", IS = "Iceland", IT = "Italy", 
                                           MT = "Malta", NL = "Netherlands", NO = "Norway", 
                                           PT = "Portugal", SK = "Slovakia")
# cocaine
cocaine_2015.tidy <- cocaine_2015 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2015, length(cocaine_2015$country)))) %>%
  mutate(drug = c(rep("cocaine", length(cocaine_2015$country))))
cocaine_2015.tidy$country <- as.factor(cocaine_2015.tidy$country)
levels(cocaine_2015.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                       CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                       DK = "Denmark", ES = "Spain", FI = "Finland", FR = "France", 
                                       GB = "Great Britain", GR = "Greece", HR = "Hungary", 
                                       IS = "Iceland", IT = "Italy", MT = "Malta", NL = "Netherlands", 
                                       NO = "Norway", PT = "Portugal", SK = "Slovakia")
# MDMA
MDMA_2015.tidy <- MDMA_2015 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2015, length(MDMA_2015$country)))) %>%
  mutate(drug = c(rep("MDMA", length(MDMA_2015$country))))
MDMA_2015.tidy$country <- as.factor(MDMA_2015.tidy$country)
levels(MDMA_2015.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus", 
                                    CZ = "Czech Republic", DE = "Germany", DK = "Denmark", ES = "Spain", 
                                    FI = "Finland", FR = "France", GB = "Great Britain", GR = "Greece", 
                                    HR = "Hungary", IT = "Italy", NL = "Netherlands", NO = "Norway", 
                                    PT = "Portugal", SK = "Slovakia")
# methamphetamine 
methamphetamine_2015.tidy <- methamphetamine_2015 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2015, length(methamphetamine_2015$country)))) %>%
  mutate(drug = c(rep("methamphetamine", length(methamphetamine_2015$country))))
methamphetamine_2015.tidy$country <- as.factor(methamphetamine_2015.tidy$country)
levels(methamphetamine_2015.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                               CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                               DK = "Denmark", ES = "Spain", FI = "Finland", 
                                               FR = "France", GB = "Great Britain", GR = "Greece", 
                                               HR = "Hungary", IS = "Iceland", IT = "Italy", 
                                               MT = "Malta", NL = "Netherlands", NO = "Norway", 
                                               PT = "Portugal", SK = "Slovakia")
# 2015 annual summary
drug_2015 <- rbind(amphetamine_2015.tidy, cocaine_2015.tidy, MDMA_2015.tidy, methamphetamine_2015.tidy)

# 2016
# amphetamine
amphetamine_2016.tidy <- amphetamine_2016 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2016, length(amphetamine_2016$country)))) %>%
  mutate(drug = c(rep("amphetamine", length(amphetamine_2016$country))))
amphetamine_2016.tidy$country <- as.factor(amphetamine_2016.tidy$country)
levels(amphetamine_2016.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                           CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                           ES = "Spain", FI = "Finland", FR = "France", 
                                           GB = "Great Britain", GR = "Greece", HR = "Hungary", 
                                           IS = "Iceland", IT = "Italy", NE = "Netherlands", 
                                           NL = "Netherlands", NO = "Norway", PT = "Portugal", 
                                           SE = "Sweden", SK = "Slovakia")
# cocaine 
cocaine_2016.tidy <- cocaine_2016 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2016, length(cocaine_2016$country)))) %>%
  mutate(drug = c(rep("cocaine", length(cocaine_2016$country))))
cocaine_2016.tidy$country <- as.factor(cocaine_2016.tidy$country)
levels(cocaine_2016.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                       CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                       ES = "Spain", FI = "Finland", FR = "France", 
                                       GB = "Great Britain", GR = "Greece", HR = "Hungary", 
                                       IS = "Iceland", IT = "Italy", NL = "Netherlands", NO = "Norway", 
                                       PT = "Portugal", SE = "Sweden", SK = "Slovakia")
# MDMA
MDMA_2016.tidy <- MDMA_2016 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2016, length(MDMA_2016$country)))) %>%
  mutate(drug = c(rep("MDMA", length(MDMA_2016$country))))
MDMA_2016.tidy$country <- as.factor(MDMA_2016.tidy$country)
levels(MDMA_2016.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus", 
                                    CZ = "Czech Republic", DE = "Germany", ES = "Spain", FI = "Finland", 
                                    FR = "France", GB = "Great Britain", GR = "Greece", HR = "Hungary", 
                                    IS = "Iceland", IT = "Italy", NL = "Netherlands", NO = "Norway", 
                                    PT = "Portugal", SE = "Sweden", SK = "Slovakia")
# methamphetamine
methamphetamine_2016.tidy <- methamphetamine_2016 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2016, length(methamphetamine_2016$country)))) %>%
  mutate(drug = c(rep("methamphetamine", length(methamphetamine_2016$country))))
methamphetamine_2016.tidy$country <- as.factor(methamphetamine_2016.tidy$country)
levels(methamphetamine_2016.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                               CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                               ES = "Spain", FI = "Finland", FR = "France", 
                                               GB = "Great Britain", GR = "Greece", HR = "Hungary", 
                                               IS = "Iceland", IT = "Italy", NL = "Netherlands", 
                                               NO = "Norway", PT = "Portugal", SE = "Sweden", 
                                               SK = "Slovakia")
# 2016 annual summary
drug_2016 <- rbind(amphetamine_2016.tidy, cocaine_2016.tidy, MDMA_2016.tidy, methamphetamine_2016.tidy)

# 2017
# amphetamine
amphetamine_2017.tidy <- amphetamine_2017 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2017, length(amphetamine_2017$country)))) %>%
  mutate(drug = c(rep("amphetamine", length(amphetamine_2017$country))))
amphetamine_2017.tidy$country <- as.factor(amphetamine_2017.tidy$country)
levels(amphetamine_2017.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                           CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                           ES = "Spain", FI = "Finland", FR = "France", 
                                           GB = "Great Britain", GR = "Greece", HR = "Hungary", 
                                           IS = "Iceland", IT = "Italy", LT = "Lithuania", 
                                           NL = "Netherlands", NO = "Norway", PL = "Poland", 
                                           PT = "Portugal", SI = "Slovenia", SK = "Slovakia")
# cocaine
cocaine_2017.tidy <- cocaine_2017 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2017, length(cocaine_2017$country)))) %>%
  mutate(drug = c(rep("cocaine", length(cocaine_2017$country))))
cocaine_2017.tidy$country <- as.factor(cocaine_2017.tidy$country)
levels(cocaine_2017.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                       CZ = "Czech Republic", DE = "Germany", ES = "Spain", 
                                       FI = "Finland", FR = "France", GB = "Great Britain", 
                                       GR = "Greece", HR = "Hungary", IS = "Iceland", IT = "Italy", 
                                       LT = "Lithuania", NL = "Netherlands", NO = "Norway", 
                                       PT = "Portugal", SI = "Slovenia", SK = "Slovakia")
# MDMA
MDMA_2017.tidy <- MDMA_2017 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2017, length(MDMA_2017$country)))) %>%
  mutate(drug = c(rep("MDMA", length(MDMA_2017$country))))
MDMA_2017.tidy$country <- as.factor(MDMA_2017.tidy$country)
levels(MDMA_2017.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus", 
                                    CZ = "Czech Republic", DE = "Germany", ES = "Spain", FI = "Finland", 
                                    FR = "France", GB = "Great Britain", GR = "Greece", HR = "Hungary", 
                                    IS = "Iceland", IT = "Italy", LT = "Lithuania", NL = "Netherlands", 
                                    NO = "Norway", PL = "Poland", PT = "Portugal", SI = "Slovenia", 
                                    SK = "Slovakia")
# methamphetamine
methamphetamine_2017.tidy <- methamphetamine_2017 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2017, length(methamphetamine_2017$country)))) %>%
  mutate(drug = c(rep("methamphetamine", length(methamphetamine_2017$country))))
methamphetamine_2017.tidy$country <- as.factor(methamphetamine_2017.tidy$country)
levels(methamphetamine_2017.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                               CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                               ES = "Spain", FI = "Finland", FR = "France", 
                                               GB = "Great Britain", GR = "Greece", HR = "Hungary", 
                                               IT = "Italy", LT = "Lithuania", NL = "Netherlands", 
                                               NO = "Norway", PL = "Poland", PT = "Portugal", 
                                               SI = "Slovenia", SK = "Slovakia")
# 2017 annual summary
drug_2017 <- rbind(amphetamine_2017.tidy, cocaine_2017.tidy, MDMA_2017.tidy, methamphetamine_2017.tidy)

# 2018
# amphetamine
amphetamine_2018.tidy <- amphetamine_2018 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2018, length(amphetamine_2018$country)))) %>%
  mutate(drug = c(rep("amphetamine", length(amphetamine_2018$country))))
amphetamine_2018.tidy$country <- as.factor(amphetamine_2018.tidy$country)
levels(amphetamine_2018.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                           CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                           DK = "Denmark", ES = "Spain", FI = "Finland", FR = "France", 
                                           GR = "Greece", HR = "Hungary", IS = "Iceland", IT = "Italy", 
                                           LT = "Lithuania", NL = "Netherlands", NO = "Norway",  
                                           PT = "Portugal", SI = "Slovenia", SK = "Slovakia", 
                                           TR = "Turkey")
# cannabis
cannabis_2018.tidy <- cannabis_2018 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2018, length(cannabis_2018$country)))) %>%
  mutate(drug = c(rep("cannabis", length(cannabis_2018$country))))
cannabis_2018.tidy$country <- as.factor(cannabis_2018.tidy$country)
levels(cannabis_2018.tidy$country) <- c(AT = "Austria", CZ = "Czech Republic", ES = "Spain", 
                                        FR = "France", GR = "Greece", HR = "Hungary", IS = "Iceland", 
                                        IT = "Italy", NL = "Netherlands", PT = "Portugal", 
                                        SI = "Slovenia", SK = "Slovakia")
# cocaine
cocaine_2018.tidy <- cocaine_2018 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2018, length(cocaine_2018$country)))) %>%
  mutate(drug = c(rep("cocaine", length(cocaine_2018$country))))
cocaine_2018.tidy$country <- as.factor(cocaine_2018.tidy$country)
levels(cocaine_2018.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                       CZ = "Czech Republic", DE = "Germany", DK = "Denmark", 
                                       ES = "Spain", FI = "Finland", FR = "France", 
                                       GB = "Great Britain", HR = "Hungary", IS = "Iceland", 
                                       IT = "Italy", LT = "Lithuania", NL = "Netherlands", 
                                       NO = "Norway", PT = "Portugal", SI = "Slovenia", 
                                       SK = "Slovakia")
# MDMA
MDMA_2018.tidy <- MDMA_2018 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2018, length(MDMA_2018$country)))) %>%
  mutate(drug = c(rep("MDMA", length(MDMA_2018$country))))
MDMA_2018.tidy$country <- as.factor(MDMA_2018.tidy$country)
levels(MDMA_2018.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus", 
                                    CZ = "Czech Republic", DE = "Germany", DK = "Denmark", ES = "Spain", 
                                    FI = "Finland", FR = "France", GR = "Greece", HR = "Hungary", 
                                    IS = "Iceland", IT = "Italy", LT = "Lithuania", NL = "Netherlands", 
                                    NO = "Norway", PT = "Portugal", SI = "Slovenia", SK = "Slovakia")
# methamphetamine
methamphetamine_2018.tidy <- methamphetamine_2018 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2018, length(methamphetamine_2018$country)))) %>%
  mutate(drug = c(rep("methamphetamine", length(methamphetamine_2018$country))))
methamphetamine_2018.tidy$country <- as.factor(methamphetamine_2018.tidy$country)
levels(methamphetamine_2018.tidy$country) <- c(AT = "Austria", BE = "Belgium", CY = "Cyprus", 
                                               CZ = "Czech Republic", DE = "Germany", DK = "Denmark", 
                                               ES = "Spain", FI = "Finland", FR = "France", 
                                               GB = "Great Britain", HR = "Hungary", IS = "Iceland", 
                                               IT = "Italy", LT = "Lithuania", NL = "Netherlands", 
                                               NO = "Norway",  PT = "Portugal", SI = "Slovenia", 
                                               SK = "Slovakia", TR = "Turkey")
# 2018 annual summary
drug_2018 <- rbind(amphetamine_2018.tidy, cannabis_2018.tidy, cocaine_2018.tidy, MDMA_2018.tidy, 
                   methamphetamine_2018.tidy)

# 2019
# amphetamine
amphetamine_2019.tidy <- amphetamine_2019 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2019, length(amphetamine_2019$country)))) %>%
  mutate(drug = c(rep("amphetamine", length(amphetamine_2019$country))))
amphetamine_2019.tidy$country <- as.factor(amphetamine_2019.tidy$country)
levels(amphetamine_2019.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                           CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                           DK = "Denmark", ES = "Spain", FI = "Finland", FR = "France", 
                                           HR = "Hungary", IS = "Iceland", IT = "Italy", 
                                           LT = "Lithuania", LV = "Latvia", NL = "Netherlands", 
                                           NO = "Norway", PL = "Poland", PT = "Portugal", SE = "Sweden", 
                                           SI = "Slovenia", TR = "Turkey")
# cannabis
cannabis_2019.tidy <- cannabis_2019 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2019, length(cannabis_2019$country)))) %>%
  mutate(drug = c(rep("cannabis", length(cannabis_2019$country))))
cannabis_2019.tidy$country <- as.factor(cannabis_2019.tidy$country)
levels(cannabis_2019.tidy$country) <- c(AT = "Austria", CH = "Switzerland", CZ = "Czech Republic", ES = "Spain", 
                                        FR = "France", GR = "Greece", HR = "Hungary", IS = "Iceland", 
                                        IT = "Italy", NL = "Netherlands", PL = "Poland", PT = "Portugal", 
                                        SE = "Sweden", SI = "Slovenia", SK = "Slovakia", TR = "Turkey")
# cocaine
cocaine_2019.tidy <- cocaine_2019 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2019, length(cocaine_2019$country)))) %>%
  mutate(drug = c(rep("cocaine", length(cocaine_2019$country))))
cocaine_2019.tidy$country <- as.factor(cocaine_2019.tidy$country)
levels(cocaine_2019.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus",
                                       CZ = "Czech Republic", DE = "Germany", DK = "Denmark", 
                                       ES = "Spain", FI = "Finland", FR = "France", 
                                       GB = "Great Britain", GR = "Greece", HR = "Hungary", IS = "Iceland", 
                                       IT = "Italy", LT = "Lithuania", LV = "Latvia", NL = "Netherlands", 
                                       NO = "Norway", PL = "Poland", PT = "Portugal", SE = "Sweden", 
                                       SI = "Slovenia", SK = "Slovakia", TR = "Turkey")
# MDMA
MDMA_2019.tidy <- MDMA_2019 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2019, length(MDMA_2019$country)))) %>%
  mutate(drug = c(rep("MDMA", length(MDMA_2019$country))))
MDMA_2019.tidy$country <- as.factor(MDMA_2019.tidy$country)
levels(MDMA_2019.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus", 
                                    CZ = "Czech Republic", DE = "Germany", DK = "Denmark", ES = "Spain", 
                                    FI = "Finland", FR = "France", GB = "Great Britain", GR = "Greece", 
                                    HR = "Hungary", IS = "Iceland", IT = "Italy", LT = "Lithuania", 
                                    LV = "Latvia", NL = "Netherlands", NO = "Norway", PL = "Poland", 
                                    PT = "Portugal", SE = "Sweden", SI = "Slovenia", TR = "Turkey")
# methamphetamine
methamphetamine_2019.tidy <- methamphetamine_2019 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2019, length(methamphetamine_2019$country)))) %>%
  mutate(drug = c(rep("methamphetamine", length(methamphetamine_2019$country))))
methamphetamine_2019.tidy$country <- as.factor(methamphetamine_2019.tidy$country)
levels(methamphetamine_2019.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                               CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                               DK = "Denmark", ES = "Spain", FI = "Finland", FR = "France", 
                                               GB = "Great Britain", GR = "Greece", HR = "Hungary", 
                                               IS = "Iceland", IT = "Italy", LT = "Lithuania", LV = "Latvia", 
                                               NL = "Netherlands", NO = "Norway",  PL = "Poland", PT = "Portugal", 
                                               SE = "Sweden", SI = "Slovenia", TR = "Turkey")
# 2019 annual summary
drug_2019 <- rbind(amphetamine_2019.tidy, cannabis_2019.tidy, cocaine_2019.tidy, MDMA_2019.tidy, 
                   methamphetamine_2019.tidy)

# 2020
# amphetamine
amphetamine_2020.tidy <- amphetamine_2020 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2020, length(amphetamine_2020$country)))) %>%
  mutate(drug = c(rep("amphetamine", length(amphetamine_2020$country))))
amphetamine_2020.tidy$country <- as.factor(amphetamine_2020.tidy$country)
levels(amphetamine_2020.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                           CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                           DK = "Denmark", ES = "Spain", FI = "Finland", FR = "France", 
                                           HR = "Hungary", IS = "Iceland", IT = "Italy", 
                                           LT = "Lithuania", LV = "Latvia", NL = "Netherlands", 
                                           NO = "Norway", PL = "Poland", PT = "Portugal", SE = "Sweden", 
                                           SI = "Slovenia", TR = "Turkey")
# cannabis
cannabis_2020.tidy <- cannabis_2020 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2020, length(cannabis_2020$country)))) %>%
  mutate(drug = c(rep("cannabis", length(cannabis_2020$country))))
cannabis_2020.tidy$country <- as.factor(cannabis_2020.tidy$country)
levels(cannabis_2020.tidy$country) <- c(AT = "Austria", CH = "Switzerland", CZ = "Czech Republic", ES = "Spain", 
                                        FR = "France", GR = "Greece", HR = "Hungary", IS = "Iceland", 
                                        IT = "Italy", NL = "Netherlands", PL = "Poland", PT = "Portugal", 
                                        SE = "Sweden", SI = "Slovenia", SK = "Slovakia", TR = "Turkey")
# cocaine
cocaine_2020.tidy <- cocaine_2020 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2020, length(cocaine_2020$country)))) %>%
  mutate(drug = c(rep("cocaine", length(cocaine_2020$country))))
cocaine_2020.tidy$country <- as.factor(cocaine_2020.tidy$country)
levels(cocaine_2020.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus",
                                       CZ = "Czech Republic", DE = "Germany", DK = "Denmark", 
                                       ES = "Spain", FI = "Finland", FR = "France", 
                                       GB = "Great Britain", GR = "Greece", HR = "Hungary", IS = "Iceland", 
                                       IT = "Italy", LT = "Lithuania", LV = "Latvia", NL = "Netherlands", 
                                       NO = "Norway", PL = "Poland", PT = "Portugal", SE = "Sweden", 
                                       SI = "Slovenia", SK = "Slovakia", TR = "Turkey")
# MDMA
MDMA_2020.tidy <- MDMA_2020 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2020, length(MDMA_2020$country)))) %>%
  mutate(drug = c(rep("MDMA", length(MDMA_2020$country))))
MDMA_2020.tidy$country <- as.factor(MDMA_2020.tidy$country)
levels(MDMA_2020.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus", 
                                    CZ = "Czech Republic", DE = "Germany", DK = "Denmark", ES = "Spain", 
                                    FI = "Finland", FR = "France", GB = "Great Britain", GR = "Greece", 
                                    HR = "Hungary", IS = "Iceland", IT = "Italy", LT = "Lithuania", 
                                    LV = "Latvia", NL = "Netherlands", NO = "Norway", PL = "Poland", 
                                    PT = "Portugal", SE = "Sweden", SI = "Slovenia", TR = "Turkey")
# methamphetamine
methamphetamine_2020.tidy <- methamphetamine_2020 %>%
  select(country, city, Weekday.mean) %>%
  mutate(year = c(rep(2020, length(methamphetamine_2020$country)))) %>%
  mutate(drug = c(rep("methamphetamine", length(methamphetamine_2020$country))))
methamphetamine_2020.tidy$country <- as.factor(methamphetamine_2020.tidy$country)
levels(methamphetamine_2020.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", 
                                               CY = "Cyprus", CZ = "Czech Republic", DE = "Germany", 
                                               DK = "Denmark", ES = "Spain", FI = "Finland", FR = "France", 
                                               GB = "Great Britain", GR = "Greece", HR = "Hungary", 
                                               IS = "Iceland", IT = "Italy", LT = "Lithuania", LV = "Latvia", 
                                               NL = "Netherlands", NO = "Norway",  PL = "Poland", PT = "Portugal", 
                                               SE = "Sweden", SI = "Slovenia", TR = "Turkey")
# 2020 annual summary
drug_2020 <- rbind(amphetamine_2020.tidy, cannabis_2020.tidy, cocaine_2020.tidy, MDMA_2020.tidy, 
                   methamphetamine_2020.tidy)


# total summary
drug <- rbind(drug_2011, drug_2012, drug_2013, drug_2014, drug_2015, drug_2016, drug_2017, drug_2018, 
              drug_2019, drug_2020)

# merge with rbind()
drug.tidy <- drug[drug$country == "Austria" | drug$country == "Belgium" | drug$country == "Finland" | 
               drug$country == "Germany" | drug$country == "Sweden" | drug$country == "Switzerland" | 
               drug$country == "Spain" | drug$country == "Slovenia",]
# subset 8 countries of interest

drug.tidy$year <- as.integer(drug.tidy$year) # transform year as integer
drug.tidy$drug <- as.factor(drug.tidy$drug) # transform drug as factor

head(drug.tidy); str(drug.tidy) # check the data
##    country         city Weekday.mean year        drug
## 1  Belgium Antwerp Zuid       239.64 2011 amphetamine
## 2  Belgium     Brussels        24.63 2011 amphetamine
## 4    Spain    Barcelona        11.31 2011 amphetamine
## 5    Spain    Castellon         0.00 2011 amphetamine
## 6    Spain     Santiago        30.58 2011 amphetamine
## 14  Sweden         Umea        16.57 2011 amphetamine
## 'data.frame':    1401 obs. of  5 variables:
##  $ country     : Factor w/ 40 levels "Belgium","Czech Republic",..: 1 1 3 3 3 10 1 3 3 3 ...
##  $ city        : chr  "Antwerp Zuid" "Brussels" "Barcelona" "Castellon" ...
##  $ Weekday.mean: num  239.6 24.6 11.3 0 30.6 ...
##  $ year        : int  2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
##  $ drug        : Factor w/ 5 levels "amphetamine",..: 1 1 1 1 1 1 2 2 2 2 ...

4.2 Population Age Data

Next, we tidied the population age (PctYouth) data. These data were already formatted to include the variables we were interested in: reference country, reference date, and PctYouth. Thus, we tidied the PctYouth data in three simple steps. First, we renamed the variables; next we reformatted the data types; and finally we removed the observations outside the scope of our study.

# step 1: rename variables 
raw.PctYouth.tidy <- raw.PctYouth # create dataframe
colnames(raw.PctYouth.tidy) <- c("country", "year", "PctYouth") # fix col names
# age = percentage of population between the ages of 15 and 24

# step 2: reformat data
raw.PctYouth.tidy$PctYouth <- as.numeric(raw.PctYouth.tidy$PctYouth) # make PctYouth numeric
# NAs produced correspond to reference countries that we will remove (outside of scope of study)
raw.PctYouth.tidy$year <- as.integer(raw.PctYouth.tidy$year) # make year an integer 
raw.PctYouth.tidy$country <- as.factor(raw.PctYouth.tidy$country)

# step 3: remove observations
PctYouth <- raw.PctYouth.tidy[raw.PctYouth.tidy$country == "Austria" | raw.PctYouth.tidy$country == "Belgium" |
                      raw.PctYouth.tidy$country == "Finland" | raw.PctYouth.tidy$country == "Germany" |
                      raw.PctYouth.tidy$country == "Sweden" | raw.PctYouth.tidy$country == "Switzerland" |
                      raw.PctYouth.tidy$country == "Spain" | raw.PctYouth.tidy$country == "Slovenia",]
# subset only the 8 reference countries we're interested in
PctYouth.tidy <- PctYouth[PctYouth$year == 2011 | PctYouth$year == 2012 | PctYouth$year == 2013 | PctYouth$year == 2014 |
                  PctYouth$year == 2015 | PctYouth$year == 2016 | PctYouth$year == 2017 | PctYouth$year == 2018 |
                    PctYouth$year == 2019 | PctYouth$year == 2020, ]
# subset only the reference dates we're interested in

# check the (tidied) PctYouth data
head(PctYouth.tidy); str(PctYouth.tidy) 
##       country year PctYouth
## 15895 Finland 2011     12.3
## 15896 Finland 2012     12.2
## 15897 Finland 2013     12.1
## 15898 Finland 2014     12.0
## 15899 Finland 2015     11.9
## 15900 Finland 2016     11.7
## 'data.frame':    80 obs. of  3 variables:
##  $ country : Factor w/ 255 levels "Afghanistan",..: 79 79 79 79 79 79 79 79 79 79 ...
##  $ year    : int  2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 ...
##  $ PctYouth: num  12.3 12.2 12.1 12 11.9 11.7 11.5 11.3 11.1 11 ...

4.3 Population Wealth Data

Then, we tidied the population wealth (GDP) data. We imported these data as a long-wise table in Section 3.2. Thus, we tidied the GDP data in three steps. First, we renamed the GDP columns and created a subset of our variables of interest. Next, we removed the observations that were outside the scope of our study. Finally, we reformatted the variable types.

# step 1: rename cols and subset vars of interest
raw.gdp.temp <- na.omit(raw.gdp) #create dataframe
colnames(raw.gdp.temp) <- c("country", "country.code", "indicator.name", "indicator.code", "year", "GDP")
# update column names
raw.gdp.tidy <- raw.gdp.temp %>% select(country, year, GDP)
# subset country, year, GDP

# step 2: remove observations
gdp <- raw.gdp.tidy[raw.gdp.tidy$country == "Austria" | raw.gdp.tidy$country == "Belgium" | 
                     raw.gdp.tidy$country == "Finland" | raw.gdp.tidy$country == "Germany" |
                     raw.gdp.tidy$country == "Sweden" | raw.gdp.tidy$country == "Switzerland" |
                     raw.gdp.tidy$country == "Spain" | raw.gdp.tidy$country == "Slovenia",]
# use logical statements to subset 8 countries from sampling frame
gdp.tidy <- as.data.frame(gdp[gdp$year == "X2011" | gdp$year == "X2012" | gdp$year == "X2013" | gdp$year == "X2014" |
                  gdp$year == "X2015" | gdp$year == "X2016" |
                    gdp$year == "X2017" | gdp$year == "X2018" |
                    gdp$year == "X2019" | gdp$year == "X2020",])
# use logical statements to subset 8 years from sampling frame 
gdp.tidy$year <- as.factor(gdp.tidy$year) # transform year to factor to fix formatting of values
levels(gdp.tidy$year) <- c(2011, 2012, 2013, 2014, 2015, 2016, 2017, 
                           2018, 2019, 2020) # redefine values
gdp.tidy$year <- as.integer(gdp.tidy$year) # transform back to integer
gdp.tidy["year"] <- gdp.tidy$year + 2010 # add 2010 to fix misread values

# step 3: reformat vars
gdp.tidy$country <- as.factor(gdp.tidy$country) 
# transform country to factor
gdp.tidy$year <- as.integer(gdp.tidy$year)
# transform year to integer

# check the data
head(gdp.tidy); str(gdp.tidy) 
##     country year         GDP
## 684 Austria 2011 4.31120e+11
## 685 Austria 2012 4.09425e+11
## 686 Austria 2013 4.30069e+11
## 687 Austria 2014 4.41996e+11
## 688 Austria 2015 3.81818e+11
## 689 Austria 2016 3.95569e+11
## 'data.frame':    80 obs. of  3 variables:
##  $ country: Factor w/ 8 levels "Austria","Belgium",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year   : int  2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 ...
##  $ GDP    : num  4.31e+11 4.09e+11 4.30e+11 4.42e+11 3.82e+11 ...
##  - attr(*, "na.action")= 'omit' Named int [1:3464] 1 2 3 4 5 6 7 8 9 10 ...
##   ..- attr(*, "names")= chr [1:3464] "1" "2" "3" "4" ...

4.4 Assembling the Data

Finally, the last step of tidying the data was to merge the three variables we tidied – drug usage, PctYouth, and GDP – into a single dataframe. Since each dataframe included information about the reference country and reference year corresponding to each observation, we merged the dataframes by a combination of reference country and reference year (i.e. we will match values of age, GDP, and drug usage from each of the respective dataframes by the variables reference country and reference year).

data.temp <- (left_join(y = PctYouth.tidy, x = drug.tidy, by = c('country', 'year'))) 
# merge 1: age and drug usage data
data <- (left_join(y = gdp.tidy, x = data.temp, by = c('country', 'year'))) 
# merge 2: result of merge 1 and GDP data

head(data); str(data) # check the data
##   country         city Weekday.mean year        drug PctYouth         GDP
## 1 Belgium Antwerp Zuid       239.64 2011 amphetamine     12.1 5.22646e+11
## 2 Belgium     Brussels        24.63 2011 amphetamine     12.1 5.22646e+11
## 3   Spain    Barcelona        11.31 2011 amphetamine     10.2 1.47877e+12
## 4   Spain    Castellon         0.00 2011 amphetamine     10.2 1.47877e+12
## 5   Spain     Santiago        30.58 2011 amphetamine     10.2 1.47877e+12
## 6  Sweden         Umea        16.57 2011 amphetamine     13.5 5.74094e+11
## 'data.frame':    1401 obs. of  7 variables:
##  $ country     : Factor w/ 269 levels "Belgium","Czech Republic",..: 1 1 3 3 3 10 1 3 3 3 ...
##  $ city        : chr  "Antwerp Zuid" "Brussels" "Barcelona" "Castellon" ...
##  $ Weekday.mean: num  239.6 24.6 11.3 0 30.6 ...
##  $ year        : int  2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
##  $ drug        : Factor w/ 5 levels "amphetamine",..: 1 1 1 1 1 1 2 2 2 2 ...
##  $ PctYouth    : num  12.1 12.1 10.2 10.2 10.2 13.5 12.1 10.2 10.2 10.2 ...
##  $ GDP         : num  5.23e+11 5.23e+11 1.48e+12 1.48e+12 1.48e+12 ...

5.0 Data Exploration

In the last section, we tidied our data so that it was properly formatted for analysis. Before we proceed with our analysis, it was important to first look at the distributions of variables in our data and to examine the relationships that existed between our variables. We do this first with the drug usage data in Section 5.1, next for the PctYouth data in Section 5.2, and then for the GDP data in Section 5.3. Finally, in Section 5.4, we finalize the data we will use in our analysis, presented in Section 6.

5.1 Drug Usage

Drug usage is a quantitative, continuously distributed variable. Thus, we created a series of 3 boxplots to assess the normality and distribution of these data. First, in Figure 5.1a, we present a boxplot of drug usage; in Figure 5.1b, we present a boxplot of drug usage ~ reference country; finally, in Figure 5.1c, we present a boxplot of drug use ~ reference drug. It is important to note that the distribution of drug usage is heavily right skewed (Figure 5.1a), and a number of potential outliers are shown by open circles in Figures 5.1a - 5.1c. Thus, we chose to log-transform drug usage to create a new variable, log(drug usage).

boxplot(data$Weekday.mean, col = NULL, main = "Figure 5.1a", 
        ylab = "Drug Usage (mg/1000 people/day)", range = 1.5) 

bwplot(data$Weekday.mean~data$country, strip = strip.custom(bg = 'white'),   
       cex = 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, layout = c(1, 1),
       main = "Figure 5.1b",
       xlab = "Reference Country", ylab = "Drug Usage (mg/1000 people/day)",
       par.settings = list(
         box.rectangle = list(col = 1),
         box.umbrella  = list(col = 1),
         plot.symbol = list(col=1),
         plot.symbol   = list(cex = .5)),
       scales = list(x = list(relation = "same"),
                     y = list(relation = "same"))) 

bwplot(data$Weekday.mean~data$drug, strip = strip.custom(bg = 'white'),   
       cex = .5, layout = c(1, 1),
       main = "Figure 5.1c",
       xlab = "Reference Drug", ylab = "Drug Usage (mg/1000 people/day)",
       par.settings = list(
         box.rectangle = list(col = 1),
         box.umbrella  = list(col = 1),
         plot.symbol = list(col=1),
         plot.symbol   = list(cex = .5)),
       scales = list(x = list(relation = "same"),
                     y = list(relation = "same"))) 

Next, we examine the distributions of log(drug use). log(drug use) is presented in Figure 5.1d, log(drug use) ~ reference country is presented in Figure 6.1e, and log(drug use) ~ reference drug is presented in Figure 5.1f. As shown in Figure 5.1d, there are no potential outliers identified in the boxplot of log(drug use). Although several potential outliers are highlighted in the below plots with open circles, these points are relatively few in number and small in magnitude when compared to distributions of the drug use data before the log-transformation (Figures 5.1a-c). Thus, these plots suggest the assumption is met for the log(drug use) data.

Z <- data$Weekday.mean+10 # to avoid NA errors
temp <- log(10 + data$Weekday.mean)
data["log.Weekday.mean"] <- temp
data$log.Weekday.mean[is.na(data$log.Weekday.mean)] <- 0

boxplot(data$log.Weekday.mean, col = NULL, main = "Figure 5.1d", 
        ylab = "Drug Usage (log(mg/1000 people/day))", range = 1.5) 

bwplot(data$log.Weekday.mean~data$country, strip = strip.custom(bg = 'white'),   
       cex = 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, layout = c(1, 1),
       main = "Figure 5.1e",
       xlab = "Reference Country", ylab = "Drug Usage (log(mg/1000 people/day))",
       par.settings = list(
         box.rectangle = list(col = 1),
         box.umbrella  = list(col = 1),
         plot.symbol = list(col=1),
         plot.symbol   = list(cex = .5)),
       scales = list(x = list(relation = "same"),
                     y = list(relation = "same"))) 

bwplot(data$log.Weekday.mean~data$drug, strip = strip.custom(bg = 'white'),   
       cex = .5, layout = c(1, 1),
       main = "Figure 5.1f",
       xlab = "Reference Drug", ylab = "Drug Usage (log(mg/1000 people/day))",
       par.settings = list(
         box.rectangle = list(col = 1),
         box.umbrella  = list(col = 1),
         plot.symbol = list(col=1),
         plot.symbol   = list(cex = .5)),
       scales = list(x = list(relation = "same"),
                     y = list(relation = "same"))) 

5.2 PctYouth

Like drug usage, PctYouth is a quantitative, continuously distributed variable. Thus, we created a series of 3 boxplots to assess the normality and distribution of these data. Next, we examine the distributions of PctYouth. PctYouth is presented in Figure 5.2a, PctYouth ~ reference country is presented in Figure 5.2b, and PctYouth ~ reference drug is presented in Figure 5.4c. As shown in Figure 5.4c, there is one potential outliers identified in the boxplot of PctYouth. Again, these potential outliers are highlighted with open circles, and these points are relatively few in number and small in magnitude. Thus, these plots suggest the assumption is met for the PctYouth data.

boxplot(data$PctYouth, col = NULL, main = "Figure 5.2a", 
        ylab = "Population Aged 15-24 (Percent)", range = 1.5) 

bwplot(data$PctYouth~data$country, strip = strip.custom(bg = 'white'),   
       cex = 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, layout = c(1, 1),
       main = "Figure 5.2b",
       xlab = "Reference Country", ylab = "Population Aged 15-24 (Percent)",
       par.settings = list(
         box.rectangle = list(col = 1),
         box.umbrella  = list(col = 1),
         plot.symbol = list(col=1),
         plot.symbol   = list(cex = .5)),
       scales = list(x = list(relation = "same"),
                     y = list(relation = "same"))) 

bwplot(data$PctYouth~data$drug, strip = strip.custom(bg = 'white'),   
       cex = .5, layout = c(1, 1),
       main = "Figure 5.2c",
       xlab = "Reference Drug", ylab = "Population Aged 15-24 (Percent)",
       par.settings = list(
         box.rectangle = list(col = 1),
         box.umbrella  = list(col = 1),
         plot.symbol = list(col=1),
         plot.symbol   = list(cex = .5)),
       scales = list(x = list(relation = "same"),
                     y = list(relation = "same"))) 

5.3 GDP

Like drug usage and PctYouth, GDP is a quantitative, continuously distributed variable. Thus, we created a series of 3 boxplots to assess the normality and distribution of these data. First, in Figure 5.3a, we present a boxplot of GDP. Next, in Figure 5.3b, we present a boxplot of GDP ~ reference country. Finally, in Figure 5.3c, we present a boxplot of GDP ~ reference drug. It is important to note that the distribution of GDP is right skewed (Figure 5.3a), and a number of potential outliers are shown by open circles in Figures 5.3a - 5.3c. Thus, we chose to log-transform GDP to create a new variable, log(GDP).

boxplot(data$GDP, col = NULL, main = "Figure 5.3a", 
        ylab = "GDP (USD)", range = 1.5) 

bwplot(data$GDP~data$country, strip = strip.custom(bg = 'white'),   
       cex = 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, layout = c(1, 1),
       main = "Figure 5.3b",
       xlab = "Reference Country", ylab = "(USD)",
       par.settings = list(
         box.rectangle = list(col = 1),
         box.umbrella  = list(col = 1),
         plot.symbol = list(col=1),
         plot.symbol   = list(cex = .5)),
       scales = list(x = list(relation = "same"),
                     y = list(relation = "same"))) 

bwplot(data$GDP~data$drug, strip = strip.custom(bg = 'white'),   
       cex = .5, layout = c(1, 1),
       main = "Figure 5.3c",
       xlab = "Reference Drug", ylab = "(USD)",
       par.settings = list(
         box.rectangle = list(col = 1),
         box.umbrella  = list(col = 1),
         plot.symbol = list(col=1),
         plot.symbol   = list(cex = .5)),
       scales = list(x = list(relation = "same"),
                     y = list(relation = "same")))  

Finally, we assess the normality of log(GDP). log(GDP) is presented in Figure 5.3d, log(GDP) ~ reference country is presented in Figure 5.3e, and log(GDP) ~ reference drug is presented in Figure 5.3f. As shown in Figure 5.3d, there are no potential outliers identified in the boxplot of PctYouth. Potential outliers shown in Figure 5.3e and Figure 5.3f, highlighted with open circles, are relatively few in number and small in magnitude. Thus, these plots suggest the assumption is met for the log(GDP) data.

Y <- data$GDP
temp2 <- log(data$GDP)
data["log.GDP"] <- temp2
data$log.GDP[is.na(data$log.GDP)] <- 0

boxplot(data$log.GDP, col = NULL, main = "Figure 5.3d", 
        ylab = "GDP (log(USD))", range = 1.5) 

bwplot(data$log.GDP~data$country, strip = strip.custom(bg = 'white'),   
       cex = 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, layout = c(1, 1),
       main = "Figure 5.3e",
       xlab = "Reference Country", ylab = "GDP (log(USD))",
       par.settings = list(
         box.rectangle = list(col = 1),
         box.umbrella  = list(col = 1),
         plot.symbol = list(col=1),
         plot.symbol   = list(cex = .5)),
       scales = list(x = list(relation = "same"),
                     y = list(relation = "same"))) 

bwplot(data$log.GDP~data$drug, strip = strip.custom(bg = 'white'),   
       cex = .5, layout = c(1, 1),
       main = "Figure 5.3f",
       xlab = "Reference Drug", ylab = "GDP (log(USD))",
       par.settings = list(
         box.rectangle = list(col = 1),
         box.umbrella  = list(col = 1),
         plot.symbol = list(col=1),
         plot.symbol   = list(cex = .5)),
       scales = list(x = list(relation = "same"),
                     y = list(relation = "same")))  

5.5 Prepare for Analysis

Thus, in Section 6 we will proceed with our analysis and construct a linear model of log(drug usage) by PctYouth and log(GDP), based on the distributions explored in the above subsections. In this final subsection, we finalize the data for analysis.

data <- data %>% select(-Weekday.mean, -GDP) # remove non-transformed variables

head(data); str(data) # check data
##   country         city year        drug PctYouth log.Weekday.mean  log.GDP
## 1 Belgium Antwerp Zuid 2011 amphetamine     12.1         5.520020 26.98217
## 2 Belgium     Brussels 2011 amphetamine     12.1         3.544720 26.98217
## 3   Spain    Barcelona 2011 amphetamine     10.2         3.059176 28.02223
## 4   Spain    Castellon 2011 amphetamine     10.2         2.302585 28.02223
## 5   Spain     Santiago 2011 amphetamine     10.2         3.703275 28.02223
## 6  Sweden         Umea 2011 amphetamine     13.5         3.279783 27.07606
## 'data.frame':    1401 obs. of  7 variables:
##  $ country         : Factor w/ 269 levels "Belgium","Czech Republic",..: 1 1 3 3 3 10 1 3 3 3 ...
##  $ city            : chr  "Antwerp Zuid" "Brussels" "Barcelona" "Castellon" ...
##  $ year            : int  2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
##  $ drug            : Factor w/ 5 levels "amphetamine",..: 1 1 1 1 1 1 2 2 2 2 ...
##  $ PctYouth        : num  12.1 12.1 10.2 10.2 10.2 13.5 12.1 10.2 10.2 10.2 ...
##  $ log.Weekday.mean: num  5.52 3.54 3.06 2.3 3.7 ...
##  $ log.GDP         : num  27 27 28 28 28 ...

6.0 Drug Metabolite Analyses

In the following section, we proceeded with our analysis to test our hypothesis that the effect of country on drug usage is a function of PctYouth and GDP. We present our analysis following the Plot -> Model -> Check Assumptions -> Interpret -> Plot Again workflow outlined in Beckerman et al. (2017).

6.1 Amphetamines

In this subsection, we look at amphetamine usage. To perform this analysis, we created a subset of only the amphetamine data in a new dataframe called amphetplot. We present the summary statistics (mean, sample size, and standard error by country) for the new dataframe below.

#Create new dataframe containing only data pertaining to amphetamine
amphetplot <- data %>% 
  filter(drug == "amphetamine") %>%
    select(country, PctYouth, log.Weekday.mean, log.GDP) 

#Create dataframe with mean amphetamine levels, sample size, and standard error by country
plot3 <- amphetplot %>% 
  group_by(country) %>%
    summarise(
      mean = mean(log.Weekday.mean, na.rm = TRUE),
      samplesize = n(),
      SE = std.error(log.Weekday.mean)
    )
plot3
## # A tibble: 8 × 4
##   country      mean samplesize     SE
##   <fct>       <dbl>      <int>  <dbl>
## 1 Belgium      4.68         44 0.125 
## 2 Spain        3.65         63 0.140 
## 3 Sweden       5.16         13 0.327 
## 4 Switzerland  3.43         43 0.0939
## 5 Finland      4.17         66 0.0692
## 6 Germany      4.20         58 0.125 
## 7 Austria      3.10         24 0.0683
## 8 Slovenia     3.12         16 0.177

Next, we present the log(drug usage) (Figure 6.1a), PctYouth (Figure 6.1b), and log(GDP) (Figure 6.1c) data associated with amphetamine as simple point plots. For the first plot of reference country vs amphetamine consumption, Austria appears to consume the least amount of amphetamine, while Belgium and Germany appear to consume the most (Figure 6.1a). The PctYouth vs amphetamine consumption plot appears to show a positive relationship between PctYouth and amphetamine consumption (Figure 6.1b). The log(GDP) vs amphetamine plot does not appear to show any relationship (Figure 6.1c).

#Plot amphetamine data by country
ggplot(data = amphetplot, aes(x = country, y = log.Weekday.mean)) +
  geom_point(size = 4) +
  xlab("Country") +
  ylab("Amphetamine Consumed (log (mg/1000 people/day))") +
  ggtitle("Figure 6.1a") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

#Plot amphetamine data by age
ggplot(data = amphetplot, aes(x = PctYouth, y = log.Weekday.mean)) +
  geom_point() +
  xlab("Percent Youth (Ages 15-24)") +
  ylab("Amphetamine Consumed (log(mg/1000 people/day))") +
  ggtitle("Figure 6.1b") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

#Plot amphetamine data by GDP
ggplot(data = amphetplot, aes(x = log.GDP, y = log.Weekday.mean)) +
  geom_point() +
  xlab("GDP(log(USD))") +
  ylab("Amphetamine Consumed (log(mg/1000 people/day))") +
  ggtitle("Figure 6.1c") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

Then, we fit a linear model to assess our hypothesis that the effect of country on amphetamine usage is a function of PctYouth and log(GDP). To address our hypothesis, we are particularly interested in the interaction effects between reference country : Pct Youth and reference country : log(GDP).

#Create Model  
ampmod <- lm(data = amphetplot, log.Weekday.mean ~ country + PctYouth + log.GDP + country:PctYouth + country:log.GDP) 

Before we can run the linear model with amphetamine data, we first had to verify that the assumptions of this analysis were met. In the residuals vs. fitted plot (Figure 6.1d), there does not appear to be any major hump-shapes or valleys, suggesting the linear model may be a good fit. In the Normal QQ plot (Figure 6.1e), most of the residuals fall on the line suggesting that these data are approximately normal. The scale-location plot (Figure 6.1f) does not show any obvious pattern, suggesting that our data have approximately equal variance. Finally, the Constant Leverage plot (Figure 6.1g) shows no obvious influential data points. Thus, we were ready to continue with the linear model for amphetamine.

#Check assumptions
amphetautoplot <- autoplot(ampmod, smooth.colour = NA) + theme_bw()

amphetautoplot[[1]] +
  labs(title = "Figure 6.1d", # add labels
       x = "Amphetamine Model Fitted Values", 
       y = "Resiudals") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

amphetautoplot[[2]] +
    labs(title = "Figure 6.1e", # add labels
       x = "Amphetamine Model Theoretical Quantities", 
       y = "Standardized Resiudals") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

amphetautoplot[[3]] +
  labs(title = "Figure 6.1f", # add labels
       x = "Amphetamine Model Fitted Values") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

amphetautoplot[[4]] + 
  labs(title = "Figure 6.1g", # add labels
       x = "Amphetamine Model Factor Level Combination") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

Then, we produce an ANOVA table and summary table to help interpret our model. Most of the variation in amphetamine usage is explained by PctYouth (Mean sq value of 21.27), and some variation is also explained by reference country (Mean sq value of 13.57). log(GDP) and all variable interactions did not capture any significant amount of variation. With this in mind, we will simplify our model to exclude log(GDP) and the interaction terms in the following subsection.

#Generate ANOVA table and summary
anova(ampmod)
## Analysis of Variance Table
## 
## Response: log.Weekday.mean
##                   Df  Sum Sq Mean Sq F value    Pr(>F)    
## country            7  95.004 13.5720 21.9771 < 2.2e-16 ***
## PctYouth           1  21.273 21.2731 34.4476 1.148e-08 ***
## log.GDP            1   0.964  0.9643  1.5615    0.2124    
## country:PctYouth   7   7.311  1.0444  1.6912    0.1106    
## country:log.GDP    7   3.730  0.5328  0.8627    0.5364    
## Residuals        303 187.118  0.6176                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(ampmod)
## 
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth + log.GDP + 
##     country:PctYouth + country:log.GDP, data = amphetplot)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1165 -0.5335  0.1072  0.4922  1.8601 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                   30.4618    58.2639   0.523   0.6015  
## countrySpain                  82.8702    78.5346   1.055   0.2922  
## countrySweden               -339.8859   197.4717  -1.721   0.0862 .
## countrySwitzerland            48.2286   153.9444   0.313   0.7543  
## countryFinland               -30.8362    73.2539  -0.421   0.6741  
## countryGermany               -38.4430    87.8976  -0.437   0.6622  
## countryAustria               -26.3494   134.1149  -0.196   0.8444  
## countrySlovenia              127.5023   290.3433   0.439   0.6609  
## PctYouth                      -0.7935     0.4503  -1.762   0.0791 .
## log.GDP                       -0.6145     2.1349  -0.288   0.7737  
## countrySpain:PctYouth         -1.7674     0.8356  -2.115   0.0352 *
## countrySweden:PctYouth        -0.3437     0.5284  -0.651   0.5159  
## countrySwitzerland:PctYouth    0.1084     0.5575   0.194   0.8460  
## countryFinland:PctYouth        0.3875     0.5502   0.704   0.4818  
## countryGermany:PctYouth       -0.3738     1.0795  -0.346   0.7294  
## countryAustria:PctYouth        0.1103     0.8484   0.130   0.8966  
## countrySlovenia:PctYouth       0.8678     4.6630   0.186   0.8525  
## countrySpain:log.GDP          -2.4279     2.8949  -0.839   0.4023  
## countrySweden:log.GDP         12.7411     7.3737   1.728   0.0850 .
## countrySwitzerland:log.GDP    -1.8619     5.5847  -0.333   0.7391  
## countryFinland:log.GDP         0.9663     2.6947   0.359   0.7202  
## countryGermany:log.GDP         1.4590     3.0373   0.480   0.6313  
## countryAustria:log.GDP         0.8552     4.8600   0.176   0.8604  
## countrySlovenia:log.GDP       -5.6808    10.4862  -0.542   0.5884  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7858 on 303 degrees of freedom
## Multiple R-squared:  0.4067, Adjusted R-squared:  0.3617 
## F-statistic: 9.032 on 23 and 303 DF,  p-value: < 2.2e-16

Given the results of the ANOVA table produced below, we then simplify our model to exclude GDP and the variable interactions. These data suggest that there is a significant difference in amphetamine consumption among the European countries examined in this study, with Sweden consuming the most amphetamine and Slovenia consuming the least (F=21.67, df=7, p<0.05). PctYouth also had a significant effect on amphetamine consumption, with a lower proportion of youth in the population being associated with higher amounts of amphetamine usage (F=33.97, df=1, p<0.05). Since there were no significant interaction effects between reference country : PctYouth or reference country : log(GDP), our hypothesis that the effect of reference country on amphetamine usage depends on PctYouth and GDP was not supported.

#Recreate model without GDP and interaction terms
ampmod2 <- lm(data = amphetplot, log.Weekday.mean ~ country + PctYouth)

#Generate ANOVA table
anova(ampmod2)
## Analysis of Variance Table
## 
## Response: log.Weekday.mean
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## country     7  95.004 13.5720  21.674 < 2.2e-16 ***
## PctYouth    1  21.273 21.2731  33.973 1.371e-08 ***
## Residuals 318 199.122  0.6262                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Generate summary
summary(ampmod2)
## 
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth, data = amphetplot)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.33266 -0.56376  0.08089  0.50235  2.07742 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         13.8310     1.5747   8.783  < 2e-16 ***
## countrySpain        -2.5576     0.3044  -8.403 1.47e-15 ***
## countrySweden        0.4142     0.2501   1.656 0.098699 .  
## countrySwitzerland  -1.6067     0.1807  -8.891  < 2e-16 ***
## countryFinland      -0.5321     0.1541  -3.454 0.000627 ***
## countryGermany      -1.3542     0.2178  -6.216 1.60e-09 ***
## countryAustria      -2.1264     0.2215  -9.602  < 2e-16 ***
## countrySlovenia     -3.5184     0.4080  -8.623 3.15e-16 ***
## PctYouth            -0.7876     0.1351  -5.829 1.37e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7913 on 318 degrees of freedom
## Multiple R-squared:  0.3687, Adjusted R-squared:  0.3528 
## F-statistic: 23.21 on 8 and 318 DF,  p-value: < 2.2e-16

Finally, we replot our amphetamine data using predicted values from our linear model by PctYouth in Figure 6.1h.

#Replot with age and country in same plot 
#Make new x values
new.x4 <- expand.grid( 
  country = unique(amphetplot$country),
  PctYouth = seq(from=min(amphetplot$PctYouth), to=max(amphetplot$PctYouth) +1),
  log.GDP = mean(amphetplot$log.GDP))

#New y values predicted from the model
new.y4 <- predict(ampmod2, newdata = new.x4, interval = "confidence")

#Add the new x and y to a data frame called addThese2
addThese4 <- data.frame(new.x4, new.y4)

#Plot age vs cocaine consumption including countries
ggplot(data = addThese4, aes(x = PctYouth, y = fit, group = country, color = country)) +
  geom_point() +
  geom_smooth(method="lm") +
  geom_errorbar(data = addThese4, aes(ymin = lwr, ymax = upr), size = 0.2, stat = "identity") +
  xlab("Percent Youth (Ages 15-24)") +
  ylab("Mean Amphetamine Usage
       (log mg/1000 people/day)") +
  ggtitle("Figure 6.1h") +
  theme_classic() + 
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

6.2 Cocaine

In this subsection, we look at cocaine usage. To perform this analysis, we created a subset of only the cocaine data in a new dataframe called cocaineplot. We present the summary statistics (mean, sample size, and standard error by country) for the cocaine data below.

#Create new dataframe containing only data pertaining to cocaine
cocaineplot <- data %>% 
  filter(drug == "cocaine") %>%
    select(country, PctYouth, log.Weekday.mean, log.GDP) 

#Create dataframe with mean cocaine levels, sample size, and standard error by country
plot2 <- cocaineplot %>% 
  group_by(country) %>%
    summarise(
      mean = mean(log.Weekday.mean, na.rm = TRUE),
      samplesize = n(),
      SE = std.error(log.Weekday.mean)
    )
plot2
## # A tibble: 8 × 4
##   country      mean samplesize     SE
##   <fct>       <dbl>      <int>  <dbl>
## 1 Belgium      5.44         47 0.141 
## 2 Spain        4.52         66 0.190 
## 3 Sweden       3.61         11 0.298 
## 4 Switzerland  6.01         49 0.0681
## 5 Finland      2.77         66 0.0744
## 6 Germany      4.55         58 0.134 
## 7 Austria      4.77         24 0.152 
## 8 Slovenia     5.40         10 0.189

Next, we plot the log(drug) (Figure 6.2a), PctYouth (Figure 6.2b), and log(GDP) (Figure 6.2c) data associated with cocaine, presented as simple point plots. Thus, we present three plots. For the first plot of reference country vs cocaine consumption, Finland appears to consume the least amount of cocaine while Belgium appears to consume the most (Figure 6.2a). The PctYouth vs cocaine consumption plot appears to show a negative relationship between PctYouth and cocaine consumption (Figure 6.2b). The log(GDP) vs cocaine plot does not appear to show any relationship (Figure 6.2c).

#Plot our cocaine data by country
ggplot(data = cocaineplot, aes(x = country, y = log.Weekday.mean)) +
  geom_point(size = 4) +
  xlab("Country") +
  ylab("Cocaine Consumed (log(mg/1000 people/day))") +
  ggtitle("Figure 6.2a") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

#Plot our cocaine data by age
ggplot(data = cocaineplot, aes(x = PctYouth, y = log.Weekday.mean)) +
  geom_point() +
  xlab("Percent Youth (Ages 15-24)") +
  ylab("Cocaine Consumed (log(mg/1000 people/day))") +
  ggtitle("Figure 6.2b") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

#Plot our cocaine data by GDP
ggplot(data = cocaineplot, aes(x = log.GDP, y = log.Weekday.mean)) +
  geom_point() +
  xlab("GDP (log(USD))") +
  ylab("Cocaine Consumed (log(mg/1000 people/day))") +
  ggtitle("Figure 6.2c") +
  theme_classic() + 
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

Then, we fit a linear model, where we hypothesize that the effect of reference country on cocaine usage is a function of PctYouth and log(GDP), using the variables from the cocaineplot data frame. To address our hypothesis, we are particularly interested the interaction effects reference country : PctYouth and reference country : log(GDP).

#Create Model  
cocmod <- lm(data = cocaineplot, log.Weekday.mean ~ country + PctYouth + log.GDP + country:PctYouth + country:log.GDP)

Before we can run the linear model with cocaine data, we first had to verify that the assumptions of this analysis were met. In the residuals vs. fitted plot (Figure 6.2d), there does not appear to be any major hump-shapes or valleys, suggesting the linear model may be a good fit. In the Normal QQ plot (Figure 6.2e), most of the residuals fall on the line suggesting that these data are approximately normal. The scale-location plot (Figure 6.2f) does not show any obvious pattern, suggesting that our data have approximately equal variance. Finally, the Constant Leverage plot (Figure 6.2g) shows no obvious influential data points. Thus, we were ready to continue with the linear model for cocaine.

#Check assumptions
cocautoplot <- autoplot(cocmod, smooth.colour = NA) + theme_bw()

cocautoplot[[1]] +
  labs(title = "Figure 6.2d", # add labels
       x = "Cocaine Model Fitted Values", 
       y = "Resiudals") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

cocautoplot[[2]] +
    labs(title = "Figure 6.2e", # add labels
       x = "Cocaine Model Theoretical Quantities", 
       y = "Standardized Resiudals") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

cocautoplot[[3]] +
  labs(title = "Figure 6.2f", # add labels
       x = "Cocaine Model Fitted Values") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

cocautoplot[[4]] + 
  labs(title = "Figure 6.2g", # add labels
       x = "Cocaine Model Factor Level Combination") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

Next, we produce an ANOVA table and summary table to help interpret our model. First looking at the ANOVA table produced below, most of the variation in cocaine usage is explained by reference country (Mean sq value of 52.613), and some variation is also explained by PctYouth (Mean sq value of 7.349), log(GDP) (Mean sq value of 8.156), the interaction between country and PctYouth (Mean sq value of 3.970), and the interaction between country and log(GDP) (Mean sq value of 1.849).

Results from the ANOVA table suggest that there is a significant interaction effect between reference country : PctYouth (F=4.67, df=7, p<0.05) and reference country : log(GDP) (F=2.18, df=7, p<0.05), which suggests that our hypothesis that the effect of reference country on cocaine usage is a function of PctYouth and GDP was supported. Since the interaction effects were significant, the main effects cannot be interpreted alone.

#Generate ANOVA table and summary
anova(cocmod)
## Analysis of Variance Table
## 
## Response: log.Weekday.mean
##                   Df Sum Sq Mean Sq F value    Pr(>F)    
## country            7 368.29  52.613 61.9014 < 2.2e-16 ***
## PctYouth           1   7.35   7.349  8.6470  0.003525 ** 
## log.GDP            1   8.16   8.156  9.5961  0.002130 ** 
## country:PctYouth   7  27.79   3.970  4.6705 5.433e-05 ***
## country:log.GDP    7  12.94   1.849  2.1750  0.036263 *  
## Residuals        307 260.93   0.850                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(cocmod)
## 
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth + log.GDP + 
##     country:PctYouth + country:log.GDP, data = cocaineplot)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.00355 -0.49919  0.00044  0.52605  2.77911 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -17.54079   68.31162  -0.257 0.797524    
## countrySpain                -261.28924   92.04349  -2.839 0.004831 ** 
## countrySweden                302.93586  268.21411   1.129 0.259588    
## countrySwitzerland           -23.44064  176.64377  -0.133 0.894518    
## countryFinland                29.02660   86.45787   0.336 0.737303    
## countryGermany                31.95908  103.09092   0.310 0.756765    
## countryAustria                37.55696  157.32096   0.239 0.811476    
## countrySlovenia              471.21647  381.57122   1.235 0.217798    
## PctYouth                      -0.76215    0.49820  -1.530 0.127097    
## log.GDP                        1.18184    2.49932   0.473 0.636645    
## countrySpain:PctYouth          3.54039    0.89985   3.934 0.000103 ***
## countrySweden:PctYouth         0.50043    0.59868   0.836 0.403869    
## countrySwitzerland:PctYouth    0.22038    0.61075   0.361 0.718477    
## countryFinland:PctYouth       -0.08913    0.62017  -0.144 0.885820    
## countryGermany:PctYouth       -0.67960    1.25419  -0.542 0.588304    
## countryAustria:PctYouth        0.65721    0.97971   0.671 0.502838    
## countrySlovenia:PctYouth      -5.05151    6.61438  -0.764 0.445622    
## countrySpain:log.GDP           8.00772    3.38388   2.366 0.018581 *  
## countrySweden:log.GDP        -11.48903   10.01414  -1.147 0.252159    
## countrySwitzerland:log.GDP     0.76289    6.39506   0.119 0.905121    
## countryFinland:log.GDP        -1.13794    3.17493  -0.358 0.720280    
## countryGermany:log.GDP        -0.99928    3.55957  -0.281 0.779107    
## countryAustria:log.GDP        -1.70790    5.69929  -0.300 0.764633    
## countrySlovenia:log.GDP      -17.17425   13.50627  -1.272 0.204486    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9219 on 307 degrees of freedom
## Multiple R-squared:  0.6193, Adjusted R-squared:  0.5908 
## F-statistic: 21.72 on 23 and 307 DF,  p-value: < 2.2e-16

Finally, we replot our cocaine data using predicted values from our linear model to visualize the interaction. In Figure 6.2h, we plot the cocaine model predicted values by PctYouth. In Figure 6.2i, we plot the cocaine model predicted values by log(GDP).

#Now do the same with age
#Make new x values
new.x2 <- expand.grid( 
  country = unique(cocaineplot$country),
  PctYouth = seq(from=min(cocaineplot$PctYouth), to=max(cocaineplot$PctYouth) +1),
  log.GDP = mean(cocaineplot$log.GDP))

#New y values predicted from the model
new.y2 <- predict(cocmod, newdata = new.x2, interval = "confidence")

#Add the new x and y to a data frame called addThese2
addThese2 <- data.frame(new.x2, new.y2)

#Plot age vs cocaine consumption including countries
ggplot(data = addThese2, aes(x = PctYouth, y = fit, group = country, color = country)) +
  geom_point() +
  geom_smooth(method="lm") +
  geom_errorbar(data = addThese2, aes(ymin = lwr, ymax = upr), size = 0.2, stat = "identity") +
  xlab("Percent Youth (Ages 15-24)") +
  ylab("Mean Cocaine Consumed (log mg/1000 people/day)") +
  ggtitle("Figure 6.2h") +
  theme_classic() + 
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

# Plot by log(GDP) vs cocaine by country to visualize the interaction.
# Make new x values 
new.x1 <- expand.grid( 
  country = unique(cocaineplot$country),
  PctYouth = mean(cocaineplot$PctYouth),
  log.GDP = seq(from=min(cocaineplot$log.GDP), to=max(cocaineplot$log.GDP) +1))

#New y values predicted from the model
new.y1 <- predict(cocmod, newdata = new.x1, interval = "confidence")

#Add the new x and y to a data frame called addThese1
addThese1 <- data.frame(new.x1, new.y1)

#Plot GDP and cocaine consumption by country using predicted values
ggplot(data = addThese1, aes(x = log.GDP, y = fit, group = country, color = country)) +
  geom_point() +
  geom_smooth(method="lm") +
  geom_errorbar(data = addThese1, aes(ymin = lwr, ymax = upr), size = 0.2, stat = "identity") +
  xlab("log(GDP)") +
  ylab("Mean Cocaine Consumed (log mg/1000 people/day)") +
  ggtitle("Figure 6.2i") +
  theme_classic() + 
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

6.3 MDMA

In the following subsection, we investigate MDMA usage. To perform this analysis, we created a subset of the MDMA data in a new dataframe called MDMAplot. The summary statistics (mean, sample size, and standard error by country) are presented below.

#Create new dataframe containing only data pertaining to MDMA
MDMAplot <- data %>%
  filter(drug == "MDMA") %>%
    select(country, PctYouth, log.Weekday.mean, log.GDP)

#Create dataframe with mean MDMA levels, sample size, and standard error by country
plot5 <- MDMAplot %>% 
  group_by(country) %>%
    summarise(
      mean = mean(log.Weekday.mean, na.rm = TRUE),
      samplesize = n(),
      SE = std.error(log.Weekday.mean)
    )
plot5
## # A tibble: 8 × 4
##   country      mean samplesize     SE
##   <fct>       <dbl>      <int>  <dbl>
## 1 Belgium      3.33         46 0.0860
## 2 Spain        2.98         64 0.0480
## 3 Sweden       2.98         11 0.162 
## 4 Switzerland  3.13         49 0.0407
## 5 Finland      2.98         68 0.0386
## 6 Germany      3.05         58 0.0531
## 7 Austria      2.91         24 0.0495
## 8 Slovenia     2.92         10 0.0874

Next, we present three plots: MDMA consumed by European country (Figure 6.3a), MDMA consumed vs. PctYouth (Figure 6.3b), and MDMA consumed vs. GDP (Figure 6.3c). For the first plot of reference country vs MDMA consumption, Sweden appears to consume the least amount of MDMA while Belgium appears to consume the most (Figure 6.3a). The PctYouth vs MDMA consumption plot does not appear to show any relationship (Figure 6.3b), and neither does the log(GDP) vs MDMA plot (Figure 6.3c).

#Plot MDMA data by country
ggplot(data = MDMAplot, aes(x = country, y = log.Weekday.mean)) +
  geom_point(size = 4) +
  xlab("Country") +
  ylab("MDMA Consumed (log(mg/1000 people/day))") +
  ggtitle("Figure 6.3a") +
  theme_bw() + 
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

#Plot MDMA data by age
ggplot(data = MDMAplot, aes(x = PctYouth, y = log.Weekday.mean)) +
  geom_point() +
  xlab("Percent Youth (Ages 15-24)") +
  ylab("MDMA consumed (log(mg/1000 people/day))") +
  ggtitle("Figure 6.3b") +
  theme_classic() + 
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

#Plot MDMA data by GDP
ggplot(data = MDMAplot, aes(x = log.GDP, y = log.Weekday.mean)) +
  geom_point() +
  xlab("GDP (log(USD))") +
  ylab("MDMA consumed (log(mg/1000 people/day))") +
  ggtitle("Figure 6.3c") +
  theme_classic() + 
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

Then, we fit a linear model, where we hypothesize that the effect of reference country on MDMA usage is a function of PctYouth and GDP, using the variables from the MDMAplot data frame. To address our hypothesis, we are particularly interested the interaction effects between reference country : PctYouth and reference country : log(GDP).

#Create Model  
mdmamod <- lm(data = MDMAplot, log.Weekday.mean ~ country + PctYouth + log.GDP + country:PctYouth + country:log.GDP)

Before we can run the linear model with MDMA data, we first had to verify that the assumptions of this analysis were met. In the residuals vs. fitted plot (Figure 6.3d), there does not appear to be any major hump-shapes or valleys, suggesting the linear model may be a good fit. The Normal QQ plot (Figure 6.3e) isn’t perfect, the positive residuals seem to be larger than expected, however, most of the other residuals fall on the line suggesting that these data are approximately normal. The scale-location plot (Figure 6.3f) does not show any obvious pattern, suggesting that our data have approximately equal variance. Finally, the Constant Leverage plot (Figure 6.3g) shows no obvious influential data points. Thus, we were ready to continue with the linear model for MDMA.

#Check assumptions
MDMAautoplot <- autoplot(mdmamod, smooth.colour = NA) + theme_bw()
MDMAautoplot[[1]] +
  labs(title = "Figure 6.3d", # add labels
       x = "MDMA Model Fitted Values", 
       y = "Resiudals") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

MDMAautoplot[[2]] +
    labs(title = "Figure 6.3e", # add labels
       x = "MDMA Model Theoretical Quantities", 
       y = "Standardized Resiudals") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

MDMAautoplot[[3]] +
  labs(title = "Figure 6.3f", # add labels
       x = "MDMA Model Fitted Values") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

MDMAautoplot[[4]] + 
  labs(title = "Figure 6.3g", # add labels
       x = "MDMA Model Factor Level Combination") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

Thus, we produce an ANOVA table and summary table to help interpret our model. First looking at the ANOVA table produced below, most of the variation in MDMA usage is explained by PctYouth (Mean sq value of 4.59). Some variation is also explained by reference country (Mean sq of 0.73). log(GDP) and all variable interactions did not capture any significant amount of variation. With this in mind, we will simplify our model to exclude log(GDP) and the interaction terms in the following section.

#Generate ANOVA table and summary
anova(mdmamod)
## Analysis of Variance Table
## 
## Response: log.Weekday.mean
##                   Df Sum Sq Mean Sq F value    Pr(>F)    
## country            7  5.127  0.7324  5.1533 1.465e-05 ***
## PctYouth           1  4.593  4.5928 32.3168 3.056e-08 ***
## log.GDP            1  0.293  0.2926  2.0588    0.1523    
## country:PctYouth   7  0.907  0.1296  0.9120    0.4973    
## country:log.GDP    7  0.266  0.0380  0.2673    0.9662    
## Residuals        306 43.488  0.1421                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mdmamod)
## 
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth + log.GDP + 
##     country:PctYouth + country:log.GDP, data = MDMAplot)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7918 -0.2443 -0.0384  0.1841  1.3732 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                  21.08331   28.31814   0.745   0.4571  
## countrySpain                -13.60951   37.94324  -0.359   0.7201  
## countrySweden                12.51519  109.77401   0.114   0.9093  
## countrySwitzerland          -18.52768   72.38106  -0.256   0.7981  
## countryFinland               -1.77287   35.40797  -0.050   0.9601  
## countryGermany               -2.05565   42.41088  -0.048   0.9614  
## countryAustria               28.60916   64.49811   0.444   0.6577  
## countrySlovenia             139.90580  156.09754   0.896   0.3708  
## PctYouth                     -0.46911    0.20502  -2.288   0.0228 *
## log.GDP                      -0.45609    1.03823  -0.439   0.6608  
## countrySpain:PctYouth        -0.35218    0.37198  -0.947   0.3445  
## countrySweden:PctYouth        0.09349    0.24589   0.380   0.7041  
## countrySwitzerland:PctYouth   0.26412    0.25080   1.053   0.2931  
## countryFinland:PctYouth       0.12617    0.25063   0.503   0.6150  
## countryGermany:PctYouth      -0.15621    0.51337  -0.304   0.7611  
## countryAustria:PctYouth       0.45198    0.40127   1.126   0.2609  
## countrySlovenia:PctYouth     -0.58775    2.70478  -0.217   0.8281  
## countrySpain:log.GDP          0.58043    1.39615   0.416   0.6779  
## countrySweden:log.GDP        -0.50944    4.09896  -0.124   0.9012  
## countrySwitzerland:log.GDP    0.56165    2.62139   0.214   0.8305  
## countryFinland:log.GDP       -0.01347    1.30349  -0.010   0.9918  
## countryGermany:log.GDP        0.13117    1.46699   0.089   0.9288  
## countryAustria:log.GDP       -1.28246    2.33766  -0.549   0.5837  
## countrySlovenia:log.GDP      -5.55083    5.52587  -1.005   0.3159  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.377 on 306 degrees of freedom
## Multiple R-squared:  0.2046, Adjusted R-squared:  0.1448 
## F-statistic: 3.422 on 23 and 306 DF,  p-value: 5.187e-07

Based on the results of the ANOVA table shown below, we simplify our model to exclude log(GDP) and the interaction terms.Results from the ANOVA table suggest that there is a significant difference in MDMA consumption among the European countries examined in this study, with Belgium consuming the most MDMA and Slovenia consuming the least (F=5.23, df=7, p<0.05). Age also had a significant effect on MDMA consumption, with a lower proportion of youth in the population being associated with higher amounts of MDMA usage (F=32.80, df=1, p<0.05). Since there were no significant interaction effects between reference country : PctYouth or reference country : log(GDP), our hypothesis that the effect of reference country on MDMA usage is a function of PctYouth and log(GDP) was not supported.

#Recreate model without GDP, age, and interaction terms
MDMAmod2 <- lm(data = MDMAplot, log.Weekday.mean ~ country + PctYouth)

#Generate ANOVA table
anova(MDMAmod2)
## Analysis of Variance Table
## 
## Response: log.Weekday.mean
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## country     7  5.127  0.7324  5.2296 1.151e-05 ***
## PctYouth    1  4.593  4.5928 32.7956 2.354e-08 ***
## Residuals 321 44.954  0.1400                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Generate summary
summary(MDMAmod2)
## 
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth, data = MDMAplot)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.79226 -0.26697 -0.04389  0.20348  1.39415 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         7.29886    0.69583  10.489  < 2e-16 ***
## countrySpain       -1.01336    0.13695  -7.399 1.21e-12 ***
## countrySweden      -0.21835    0.12769  -1.710   0.0882 .  
## countrySwitzerland -0.33219    0.08045  -4.129 4.65e-05 ***
## countryFinland     -0.35859    0.07149  -5.016 8.74e-07 ***
## countryGermany     -0.66749    0.10025  -6.659 1.20e-10 ***
## countryAustria     -0.65860    0.10354  -6.361 6.91e-10 ***
## countrySlovenia    -1.25628    0.19800  -6.345 7.57e-10 ***
## PctYouth           -0.34097    0.05954  -5.727 2.35e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3742 on 321 degrees of freedom
## Multiple R-squared:  0.1778, Adjusted R-squared:  0.1573 
## F-statistic: 8.675 on 8 and 321 DF,  p-value: 9.954e-11

In Figure 6.3h, we replot our MDMA data using predicted values from our linear model by PctYouth.

#Plot PctYouth and country vs mean MDMA consumption
#Make new x values
new.x7 <- expand.grid( 
  country = unique(MDMAplot$country),
  PctYouth = seq(from=min(MDMAplot$PctYouth), to=max(MDMAplot$PctYouth) +1),
  log.GDP = mean(MDMAplot$log.GDP))

#New y values predicted from the model
new.y7 <- predict(MDMAmod2, newdata = new.x7, interval = "confidence")

#Add the new x and y to a data frame called addThese7
addThese7 <- data.frame(new.x7, new.y7)

#Plot age vs cocaine consumption including countries
ggplot(data = addThese7, aes(x = PctYouth, y = fit, group = country, color = country)) +
  geom_point() +
  geom_smooth(method="lm") +
  geom_errorbar(data = addThese7, aes(ymin = lwr, ymax = upr), size = 0.2, stat = "identity") +
  xlab("Percent Youth (Ages 15-24)") +
  ylab("Mean MDMA Consumed (log(mg/1000 people/day))") +
  ggtitle("Figure 6.3h") +
  theme_classic() + 
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

6.4 Methamphetamine

Finally, in this last subsection, we look at methamphetamine usage. To perform this analysis, we created a subset of only the methamphetamine usage data in a new dataframe called methplot. Summary statistics (mean, sample size, and standard error by country) are shown below.

#Create new dataframe containing only data pertaining to methamphetamine
methplot <- data %>% 
  filter(drug == "methamphetamine") %>%
    select(country, PctYouth, log.Weekday.mean, log.GDP) 

#Create table with mean methamphetamine levels, sample size, and standard error by country
plot4 <- methplot %>% 
  group_by(country) %>%
    summarise(
      mean = mean(log.Weekday.mean, na.rm = TRUE),
      samplesize = n(),
      SE = std.error(log.Weekday.mean)
    )
plot4
## # A tibble: 8 × 4
##   country      mean samplesize     SE
##   <fct>       <dbl>      <int>  <dbl>
## 1 Belgium      2.65         47 0.0611
## 2 Spain        2.64         66 0.0622
## 3 Sweden       2.63         15 0.133 
## 4 Switzerland  3.09         43 0.0946
## 5 Finland      2.97         64 0.0692
## 6 Germany      3.49         58 0.172 
## 7 Austria      2.51         24 0.0308
## 8 Slovenia     2.56         11 0.150

Then, we plot the log(drug usage) (Figure 6.4a), PctYouth (Figure 6.4b), and log(GDP) (Figure 6.4c) data associated with methamphetamine, presented as simple point plots. For the first plot of reference country vs methamphetamine consumption, Austria appears to consume the least amount of methamphetamine, while Germany appears to consume the most (Figure 6.4a). The PctYouth vs methamphetamine consumption plot does not appear to show any relationship (Figure 6.4b). The log(GDP) vs methamphetamine plot looks like there may be a positive relationship between log(GDP) and methamphetamine consumption (Figure 6.4c).

#Plot methamphetamine data by country
ggplot(data = methplot, aes(x = country, y = log.Weekday.mean)) +
  geom_point(size = 4) +
  xlab("Country") +
  ylab("Methamphetamine Usage 
       (log(mg/1000 people/day))") +
  ggtitle("Figure 6.3a") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

#Plot methamphetamine data by age
ggplot(data = methplot, aes(x = PctYouth, y = log.Weekday.mean)) +
  geom_point() +
  xlab("Percent Youth (Ages 15-24)") +
  ylab("Methamphetamine Usage 
       (log(mg/1000 people/day))") +
  ggtitle("Figure 6.3b") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

#Plot methamphetamine data by GDP
ggplot(data = methplot, aes(x = log.GDP, y = log.Weekday.mean)) +
  geom_point() +
  xlab("GDP (log(USD))") +
  ylab("Methamphetamine Usage 
       (log(mg/1000 people/day))") +
  ggtitle("Figure 6.3c") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

Next, we fit a linear model to assess our hypothesis that the effect of reference country on methamphetamine usage is a function of PctYouth and log(GDP). We are particularly interested the interaction effects between reference country : PctYouth and reference country : log(GDP).

#Create Model  
methmod <- lm(data = methplot, log.Weekday.mean ~ country + PctYouth + log.GDP + country:PctYouth + country:log.GDP)

Before we can run the linear model with methamphetamine data, we first had to verify that the assumptions of this analysis were met. In the residuals vs. fitted plot (top left), there does not appear to be any major hump-shapes or valleys, suggesting the linear model may be a good fit. The Normal QQ plot (top right) isn’t perfect, the positive residuals seem to be larger than expected, however, most of the other residuals fall on the line suggesting that these data are approximately normal. The scale-location plot (bottom left) does not show any obvious pattern, suggesting that our data have approximately equal variance. Finally, the Constant Leverage plot (bottom right) shows no obvious influential data points. Thus, we were ready to continue with the linear model for methamphetamine.

#Check assumptions
methautoplot <- autoplot(methmod, smooth.colour = NA) + theme_bw()
methautoplot[[1]] +
  labs(title = "Figure 6.4d", # add labels
       x = "Methamphetamine Model Fitted Values", 
       y = "Resiudals") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

methautoplot[[2]] +
    labs(title = "Figure 6.4e", # add labels
       x = "Methamphetamine Model Theoretical Quantities", 
       y = "Standardized Resiudals") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

methautoplot[[3]] +
  labs(title = "Figure 6.4f", # add labels
       x = "Methamphetamine Model Fitted Values") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

methautoplot[[4]] + 
  labs(title = "Figure 6.4g", # add labels
       x = "Methamphetamine Model Factor Level Combination") +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

We produce an ANOVA table and summary table to help interpret our model. First looking at the ANOVA table produced below, most of the variation in methamphetamine usage is explained by reference country (Mean sq value of 5.03). A significant amount of variation was also explained by the interaction between reference country and PctYouth (Mean sq value of 1.04). While log(GDP), PctYouth, and all other variable interactions did not capture any significant amount of variation. With this in mind, we will simplify our model to exclude log(GDP), PctYouth and the interaction between reference country and log(GDP).

#Generate ANOVA table and summary
anova(methmod)
## Analysis of Variance Table
## 
## Response: log.Weekday.mean
##                   Df  Sum Sq Mean Sq F value    Pr(>F)    
## country            7  35.208  5.0298 10.1483 2.027e-11 ***
## PctYouth           1   1.346  1.3458  2.7153   0.10043    
## log.GDP            1   0.050  0.0500  0.1010   0.75088    
## country:PctYouth   7   7.250  1.0357  2.0897   0.04446 *  
## country:log.GDP    7   4.839  0.6913  1.3948   0.20690    
## Residuals        304 150.671  0.4956                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(methmod)
## 
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth + log.GDP + 
##     country:PctYouth + country:log.GDP, data = methplot)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5017 -0.3675 -0.1477  0.2276  2.3204 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                  -8.86653   52.16471  -0.170   0.8651  
## countrySpain                -23.93406   70.28705  -0.341   0.7337  
## countrySweden               374.61026  199.45175   1.878   0.0613 .
## countrySwitzerland           32.45723  137.16714   0.237   0.8131  
## countryFinland               94.93113   66.12107   1.436   0.1521  
## countryGermany              -30.42431   78.72319  -0.386   0.6994  
## countryAustria               42.66473  120.13481   0.355   0.7227  
## countrySlovenia             148.62411  281.76235   0.527   0.5982  
## PctYouth                     -0.41878    0.38044  -1.101   0.2719  
## log.GDP                       0.60808    1.90855   0.319   0.7502  
## countrySpain:PctYouth         0.10361    0.68715   0.151   0.8803  
## countrySweden:PctYouth        0.82704    0.43764   1.890   0.0597 .
## countrySwitzerland:PctYouth  -0.22490    0.47120  -0.477   0.6335  
## countryFinland:PctYouth       0.07291    0.48399   0.151   0.8804  
## countryGermany:PctYouth      -1.68455    0.95773  -1.759   0.0796 .
## countryAustria:PctYouth       0.06624    0.74813   0.089   0.9295  
## countrySlovenia:PctYouth     -3.31343    4.78856  -0.692   0.4895  
## countrySpain:log.GDP          0.77175    2.58403   0.299   0.7654  
## countrySweden:log.GDP       -14.21898    7.43586  -1.912   0.0568 .
## countrySwitzerland:log.GDP   -1.09308    4.97312  -0.220   0.8262  
## countryFinland:log.GDP       -3.61590    2.42630  -1.490   0.1372  
## countryGermany:log.GDP        1.63310    2.71819   0.601   0.5484  
## countryAustria:log.GDP       -1.63186    4.35215  -0.375   0.7080  
## countrySlovenia:log.GDP      -4.77991   10.02886  -0.477   0.6340  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.704 on 304 degrees of freedom
## Multiple R-squared:  0.2442, Adjusted R-squared:  0.1871 
## F-statistic: 4.272 on 23 and 304 DF,  p-value: 1.546e-09

Given the results of the ANOVA table produced below, we simplify our model to exclude PctYouth, log(GDP), and the interaction between reference country and log(GDP). Results from the ANOVA table suggest that there is a significant interaction effect between reference country and PctYouth (F=2.14, df=7, p<0.05), which partially supports our hypothesis that the effect of reference country on methamphetamine usage is a function of PctYouth and log(GDP).

#Recreate model without GDP, age, and interaction terms
methmod2 <- lm(data = methplot, log.Weekday.mean ~ country + country:PctYouth)

#Generate ANOVA table
anova(methmod2)
## Analysis of Variance Table
## 
## Response: log.Weekday.mean
##                   Df  Sum Sq Mean Sq F value    Pr(>F)    
## country            7  35.208  5.0298 10.0853 2.237e-11 ***
## country:PctYouth   8   8.555  1.0694  2.1442   0.03162 *  
## Residuals        312 155.601  0.4987                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Generate summary
summary(methmod2)
## 
## Call:
## lm(formula = log.Weekday.mean ~ country + country:PctYouth, data = methplot)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4471 -0.3624 -0.1889  0.2485  2.3507 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                   7.6945     4.4119   1.744  0.08214 . 
## countrySpain                 -3.4146     6.8726  -0.497  0.61965   
## countrySweden                -6.5247     4.7804  -1.365  0.17327   
## countrySwitzerland            2.5015     5.2129   0.480  0.63166   
## countryFinland               -2.9377     5.5053  -0.534  0.59398   
## countryGermany               20.4933    10.0167   2.046  0.04160 * 
## countryAustria               -2.1745     7.6777  -0.283  0.77720   
## countrySlovenia              15.4100    30.0315   0.513  0.60823   
## countryBelgium:PctYouth      -0.4336     0.3788  -1.145  0.25321   
## countrySpain:PctYouth        -0.1688     0.5433  -0.311  0.75621   
## countrySweden:PctYouth        0.1250     0.1568   0.797  0.42580   
## countrySwitzerland:PctYouth  -0.6298     0.2458  -2.562  0.01088 * 
## countryFinland:PctYouth      -0.1542     0.2844  -0.542  0.58807   
## countryGermany:PctYouth      -2.3498     0.8554  -2.747  0.00636 **
## countryAustria:PctYouth      -0.2755     0.5748  -0.479  0.63206   
## countrySlovenia:PctYouth     -2.2461     3.2481  -0.692  0.48976   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7062 on 312 degrees of freedom
## Multiple R-squared:  0.2195, Adjusted R-squared:  0.182 
## F-statistic:  5.85 on 15 and 312 DF,  p-value: 1.111e-10

In Figure 6.4h, we replot our methamphetamine model predicted values by PctYouth to visualize the interaction.

#Replot with age and country in same plot 
#Make new x values
new.x5 <- expand.grid( 
  country = unique(methplot$country),
  PctYouth = seq(from=min(methplot$PctYouth), to=max(methplot$PctYouth) +1),
  log.GDP = mean(methplot$log.GDP))

#New y values predicted from the model
new.y5 <- predict(methmod2, newdata = new.x5, interval = "confidence")

#Add the new x and y to a data frame called addThese5
addThese5 <- data.frame(new.x5, new.y5)

#Plot age vs cocaine consumption including countries
ggplot(data = addThese5, aes(x = PctYouth, y = fit, group = country, color = country)) +
  geom_point() +
  geom_smooth(method="lm") +
  geom_errorbar(data = addThese5, aes(ymin = lwr, ymax = upr), size = 0.2, stat = "identity") +
  xlab("Percent Youth (Ages 15-24)") +
  ylab("Mean Methamphetamine Consumed (log mg/1000 people/day)") +
  ggtitle("Figure 6.4h") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5)) + # center title
  theme(plot.title = element_text(face = "bold")) + # bold title
  theme(text = element_text(size=14))

7.0 References

Bannwarth, A., Morelato, M., Benaglia, L., Been, F., Esseiva, P., Delemont, O. and Roux, C. (2019). The use of wastewater analysis in forensic intelligence: drug consumption comparison between Sydney and different European cities. Forensic Sci. Res. 4, 141.

Beckerman, A. P., Childs, D. Z., & Petchey, O. L. (2017). Getting Started with R: An Introduction for Biologists 2nd Edition. Oxford, UK: Oxford University Press.

Betancourt W. Q., Schmitz B. W., Innes, G. K., Prasek, S. M., & Brown, P. (2021). COVID-19 containment on a college campus via wastewater-based epidemiology, targeted clinical testing and an intervention. Science of The Total Environment, 779, 0048-9697

Chen K., Kandel D. B., Davies M. (1997). Relationships between frequency and quantity of marijuana use and last year proxy dependence among adolescents and adults in the United States. Drug Alcohol Depend, 46, 53–67

Compton, W. M., Cottler, L. B., Phelps, D. L., Abdallah, B. A, & Spitznagel, E. L. (2000). Psychiatric disorders among drug dependent subjects: are they primary or secondary? Am J Addict, 9, 126–134

Concheiro, M., De Castro, A., Quintela, O., Cruz, A. and Rivadulla, M. L. (2007) Determination of Illicit Drugs and their Metabolites in Human Urine by Liquid Chromatography Tandem Mass Spectrometry Including Relative Ion Intensity Criterion. J. Anal. Toxicol. 31,.

Lipari, R. N. and Park-Lee, E. (2019). Key Substance Use and Mental Health Indicators in the United States: Results from the 2018 National Survey on Drug Use and Health.

Metcalf, T. G., Melnick, J. L. and Estes, M. K. (1995). Environmental virology: from detection of virus in sewage and water by isolation to identification by molecular biology–a trip of over 50 years. Annu. Rev. Microbiol. 49, 461–487.

Moshe, S. (2011). GDP as a Measure of Economic Welfare. ICER Working Paper, Retrieved from https://ssrn.com/abstract=1808685

Nieves, C., Cadet -James, Y., Mccalman, J. and Haswell, M. SOCIAL DETERMINANTS OF DRUG USE What works? A review of act ions addressing the social and economic determinants of Indigenous he… Katy Osborne A literature review for Indigenous men’s groups.

Russell, J. M., Newman, S. C., & Bland, R. C. (1994). Epidemiology of psychiatric disorders in Edmonton: Drug abuse and dependence. Acta Psychiatr Scand Suppl, 376, 54–62

Wastewater analysis and drugs — a European multi-city study | www.emcdda.europa.eu.

Zuccato, E., Chiabrando, C., Castiglioni, S., Bagnati, R. and Fanelli, R. (2008). Estimating community drug abuse by wastewater analysis. Environ. Health Perspect. 116, 1027–1032.

8.0 Appendicies

Appendix A

Project Metadata

Drug metabolite data
Intent of data:
The intent of this data set was to determine the level of use of the drugs cocaine, cannabis, methamphetamines, amphetamines, and MDMA across Europe.

Data type:
This dataset takes the form of categorical data.

Data acquisition:
This data was accessed through the European Monitoring Centre for Drugs and Drug addiction website: (https://www.emcdda.europa.eu/publications/html/pods/waste-water-analysis_en#siteInfoTable)

Study site(s):
The study sites in this data set included wastewater treatment plants from 137 cities across 29 European countries.

Note: cities followed by a number in parantheses indicate where an average was found for multiple waste-water treatment plants

Countries and cities:
Austria: Graz, Hall-Wattens, Innsbruck, Kapfenburg, Klosterneuburg, Kufstein, Murzzuschlag, Purgstall, Region Millstattersee, Strass im Zillertal, Wasserverbard Hofsteig
Bosnia: Sarajevo
Belgium: Antwerp Deurne, Antwerp Zuid, Boom, Brugge, Brussels, Geraardsbergen, Koksijde, Merchtem, Ninove, Oostende, Ruisbroek Pruus
Croatia: Zagreb
Cyprus: Limassol, Nicosia
Czech Republic: Brno, Budweis, Karlovy Vary, Ostrava, Prague
Denmark: Copenhagen
Finland: Espoo, Helsinki, Hameenlinna, Joensuu, Jyvaskyla, Kajaani, Kemi, Kokkola, Kotka, Kouvola, Kuopio, Lahti, Lappeenranta, Mariehamn, Mikkeli, Oulu, Pietarsaari, Pori, Rauma, Rovaniemi, Salo, Savonlinna, Seinajoki, Tampere, Turku, Vihti
France: Bordeux I, Fort-de-France P, Paris Seine Centre, Paris South Suburbs
Germany: Berlin (4), Chemnitz, Dortmund, Dresdend, Dulmen, Erfurt, Frankfurt (3), Hamburg (2), Hannover (2), Magdeburg, Mainz, Munich Gut_Grosslapen, Nuremburg, Rostock, Saarbrücken (2), Stuttgart
Great Britain: Bristol, London
Greece: Athens, Eleusis, Mytilene, Thessaloniki
Iceland: Reykjavik Klettagardar
Italy: Bozen, Milan
Lithuania: Kaunas, Klaipeda, Vilnius
Latvia: Riga
Malta: Malta
Netherlands: Amsterdam, Eindhoven, Neiuwegein Ijssel, Oudewater, Utrecht
Norway: Oslo
Poland: Krakow P
Portugal: Almada, Lisbon, Porto
Romania: Cluj Napoca
Serbia: Belgrade, Novi Sad
Spain: Barcelona, Castellon, Madrid VdlV, Molina de Segura, Santiago, Valencia (3)
Sweden: Gothenburg, Gavle, Sandviken, Stockholm (2), Soderhamn, Umea, Uppsala
Switzerland: Basel, Berne, Geneva, Lausanne, Lugano, St. Gallen Hofen, Zurich
Slovenia: Domzale-Kamnik, Koper, Ljubljana, Maribor, Novo mesto, Velenje
Slovakia: Bratislava (2), Piestany
Turkey: Adana, Istanbul I, Istanbul (II-VII)

Dates of data collection:
The data was collected over a period of 9 years (from 2011-2020).

Data collection methods:
Samples were collected daily over the above specified time period. An automated sampling device was used to collect 24 composite samples for each hour of the day. Samples were collected between 8am-10am the following day and combined to create a representative 24-h composite sample. The concentrations of target residues (in nanograms/litre) were obtained through off and on-line solid phase extraction with polymeric cartridges, and liquid chromatography coupled to mass spectrometry. The drug use for each city was then estimated by multiplying the concentration of each target drug reside with the corresponding flow of sewage (litre/day) and a correction factor (obtained by considering the average excretion rate of a given drug residue and the molecular mass ratio of the parent drug/metabolite) and then dividing by the population served by the wastewater treatment plant. The resulting number represents the amount of substance consumed per day per 1000 people. Averages of substance consumption were found for the entire week, all weekdays, and the weekend for every year of the study.

Ethical Statement:
Guidelines outlining the main ethical risks for wastewater research and strategies on how to mitigate these risks can be found here: https://score-cost.eu/wp-content/uploads/sites/118/2016/11/WBE-ethical-guidelines-FINAL-March-2016-.pdf.

Preparing the data:
For the purpose of our study, only Countries for which drug metabolite data was available in 6 or more cities were included. This encompossed 8 countries (Austria, Belgium, Finland, Germany, Sweden, Switzerland, Spain, and Slovenia) and 90 cities. Additionally, the time period was narrowed down to 2011-2018. To ensure more even comparisons, this study followed a blocking design where 5 cities were randomly selected from each of the 8 remaining countries. To randomly select 5 cities from each country, first a sampling frame was constructed in excel where each of the 90 cities were numbered and paired with their corresponding country. This sampling frame was input into R studio (with the read.csv() function) in order to create a series of subsamples for each country. To create the subsamples, the function sample() was used with the arguments x (the range of numbers of the cities in the country of interest), size (how many cities should be in each subsample), and replace (can the same city be selected twice). To get the total sample population for our study, we then used the rbind() function to join together the newly created subsamples. The data frame containing the total sample population was then exported as a .csv file using the write.csv() function so that it could be combined with the drug metabolite data for each randomly selected city and used in downstream analysis. For ease of analysis, only the weekday mean of drug use was considered.

GDP Data:
Intent of the data:
To determine the gross domestic product (GDP) of 266 countries/regions around the world.
GDP is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources.
Please see the World Bank website for more information on the limitations of using GDP in data analysis.

Data type:
This dataset takes the form of categorical data.

Data acquisition:
The raw data set was accessed via The World Bank website (https://data.worldbank.org/indicator/NY.GDP.MKTP.CD)

Study site(s):
The study sites in this data set are comprised of 266 different countries/regions around the world. Please see https://databank.worldbank.org/reports.aspx?source=2&series=NY.GDP.MKTP.CD&country= for a complete list.

Dates of data collection:
The data was collected yearly for a period of 80 years (from 1960-2020).

Data collection methods:
This dataset was based on World Bank national accounts data and OECD National Accounts data files. Data collection methods include: population censuses, household surveys, national accounts, purchasing power parity surveys, income surveys, and agricultural censuses.
Country data is collected in the local currency units, and then converted to the constant price of the base year as determined by each country. For GDP, these values are then converted into constant 2010 USD with gap filling of missing values. Missing data are imputed based on the relationship of the sum of available data to the total in the year of the previous estimate.

Country Specifics
Austria: Latest population census conducted in 2011, 1 (1999) euro = 13.7603 Austrian schilling, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2010.
Belgium: Latest population census conducted in 2011, 1 (1999) euro = 40.3399 Belgian franc, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2010.
Finland: Latest population census conducted in 2011, 1 (1999) euro = 5.94573 Finnish markkka, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2010.
Germany: Latest population census conducted in 2011, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2010.
Sweden: Latest population census conducted in 2011, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2010.
Swizterland: Latest population census conducted in 2011, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2010.
Spain: Latest population census conducted in 2011, 1 (1999) euro = 166.386 Spanish peseta, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2009.
Slovenia: Latest population census conducted in 2018, 1 (1999) euro = 239.64 Slovenian tolar, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2009, latest household survey conducted in 2003.

Preparing the data:
For accurate comparison, the original data set was narrowed down to only include the countries and years considered by our drug metabolite data.

Age data
Intent of data:
The intent of this dataset was to document the percentage of the population by age group for 235 countries across the world.

Data type:
This data set takes the form of categorical data.

Data acquisition:
This data set was accessed from the United Nations Department of Economic and Social Affair’s website (https://population.un.org/wpp/Download/Standard/Interpolated/).

Study site(s):
The study sites within this data set encompass 235 countries from across the world. Please refer to (https://population.un.org/wpp/Download/Metadata/Documentation/) for a full list of the countries included.

Dates of data collection:
The data was collected annually for a period of 70 years (from 1950-2020).

Data collection methods:
The data collection methods were comprised of a mix of census data, population registers, and official estimates.

Country Specifics
Austria: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1951, 1961, 1971, 1981, 1991, 2001, 2011 censuses; population register through 2017; official estimates through 2017.
Belgium: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1961, 1970, 1981, 1991, 2001, 2011 censuses; population register through 2017; official estimates through 2017.
Finland: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1950, 1960, 1970, 1975, 1985, 1990, 2000, 2010 censuses; population register through 2017; official estimates through 2017.
Germany: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1987, 2001, 2011 censuses; population register through 2018; official estimates through 2018.
Sweden: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1950, 1960, 1965, 1970, 1975, 1980, 1985, 1990, 2011 censuses; population register through 2018; official estimates through 2018.
Swizterland: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1950, 1960, 1970, 1980, 1990, 2000, 2010 censuses; official estimates through 2017.
Spain: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1950, 1960, 1970, 1981, 1991, 2001, 2011 censuses; official estimates through 2017.
Slovenia: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1953, 1961, 1971, 1981, 1991, 2002, 2011, 2015 censuses; population register through 2017; official estimates through 2017.

Preparing the data:
For accurate comparison, the original data set was narrowed down to only include the countries and years considered by our drug metabolite and GDP data. For ease of analysis, only information on the percentage of the population within the age group 15-24 (for each Country) was extracted from the original data set. A separate file with the above changes was created in excel for further analysis in R studio.

Codebook:
Units
Drug metabolites (amphetamines, methamphetamines, cocaine, MDMA): mg/1000 people/day
GDP: 2010 USD
Population: Percentage of total population that is 15-24

Extra tools/software:
Github, excel, R Studio

Appendix B

Wastewater Based Epidemiology (WBE)

Wastewater based epidemiology (WBE) is a chemical analysis technique used to detect a myriad of compounds for regions serviced by wastewater treatment facilities. The functions of this technique range from monitoring environmental impact of household liquid waste to the very topical application of detecting the presence of Covid-19 (Glassmeyer et al., 2005), (Bentacourt et al., 2021).

Wastewater data account for response bias by indiscriminately collecting wastewater from a given locality. It is broad in the sense that only the averages for a population can be gleaned and will not reveal the true distribution of drug use for individuals within in a population. However, given the scale, consistency, and reliability of the method, these wastewater data can be paired with other demographic data for multiple locations to tease out potential factors affecting drug consumption (Castiglioni et al., 2014).

In order to estimate levels of drug use from wastewater, researchers first identify and quantify drug residues. Then, concentration is determined by measuring the daily flow rate for the location the sample was taken. Next, a correction factor is applied to each drug to account for changes in structure and quantity after metabolism. The back-calculated total quantity of consumption is then divided by the population for the area serviced by the treatment facility. And finally, an average is produced for each drug consumed. This method is not novel but has never before been applied on such a large geographic and temporal scale. Indeed, this represents a large scale coordinated effort to quantify drug use patterns for critical populations. The consistency exercised throughout the study lends credence to the undertaking, making it perfectly adaptable to further investigations. This dataset therefore served as the primary scaffolding for our own investigation. With so many geographic domains represented through time, we were interested in determining if these wastewater data could be used with other known variables to explore potential relationships between drug use and demographic characteristics of a population.

The countries from this set that we chose to include in this study were; Spain, Sweden, Switzerland, Belgium, Finland, Austria, Germany, and Slovenia.